338:
1688:
1489:
1812:
623:
396:
66:
168:
689:
582:
25:
270:
1633:
Complementing these, Clustering-Based
Methods such as CisFinder employ nucleotide substitution matrices for motif clustering, effectively mitigating redundancy. Concurrently, Tree-Based Methods like Weeder and FMotif exploit tree structures, and Graph Theoretic-Based Methods (e.g., WINNOWER) employ graph representations, demonstrating the richness of enumeration strategies.
1632:
Initiating the motif discovery journey, the enumerative approach witnesses algorithms meticulously generating and evaluating potential motifs. Pioneering this domain are Simple Word
Enumeration techniques, such as YMF and DREME, which systematically go through the sequence in search of short motifs.
1496:
Motif discovery happens in three major phases. A pre-processing stage where sequences are meticulously prepared in assembly and cleaning steps. Assembly involves selecting sequences that contain the desired motif in large quantities, and extraction of unwanted sequences using clustering. Cleaning
1465:
The first column specifies the position, the second column contains the number of occurrences of A at that position, the third column contains the number of occurrences of C at that position, the fourth column contains the number of occurrences of G at that position, the fifth column contains the
1683:
algorithms, featured in GAEM, GARP, and MACS, venture into pheromone-based exploration. These algorithms, mirroring nature's adaptability and cooperative dynamics, serve as avant-garde strategies for motif identification. The synthesis of heuristic techniques in hybrid approaches underscores the
1641:
Diverging into the probabilistic realm, this approach capitalizes on probability models to discern motifs within sequences. MEME, a deterministic exemplar, employs
Expectation-Maximization for optimizing Position Weight Matrices (PWMs) and unraveling conserved regions in unaligned DNA sequences.
1479:
The sequence motif discovery process has been well-developed since the 1990s. In particular, most of the existing motif discovery research focuses on DNA motifs. With the advances in high-throughput sequencing, such motif discovery problems are challenged by both the sequence pattern degeneracy
1747:
devised a code they called the "three-dimensional chain code" for representing the protein structure as a string of letters. This encoding scheme reveals the similarity between the proteins much more clearly than the amino acid sequence (example from article): The code encodes the
1623:
Motif discovery algorithms use diverse strategies to uncover patterns in DNA sequences. Integrating enumerative, probabilistic, and nature-inspired approaches, demonstrate their adaptability, with the use of multiple methods proving effective in enhancing identification accuracy.
1642:
Contrasting this, stochastic methodologies like Gibbs
Sampling initiate motif discovery with random motif position assignments, iteratively refining the predictions. This probabilistic framework adeptly captures the inherent uncertainty associated with motif discovery.
1466:
number of occurrences of T at that position, and the last column contains the IUPAC notation for that position. Note that the sums of occurrences for A, C, G, and T for each row should be equal because the PFM is derived from aggregating several consensus sequences.
1202:
A position frequency matrix (PFM) records the position-dependent frequency of each residue or nucleotide. PFMs can be experimentally determined from SELEX experiments or computationally discovered by tools such as MEME using hidden Markov
955:
a sequence of elements of the pattern notation matches a sequence of amino acids if and only if the latter sequence can be partitioned into subsequences in such a way that each pattern element matches the corresponding subsequence in
1658:
into their fabric for motif identification. The incorporation of
Bayesian clustering methods enhances the probabilistic foundation, providing a holistic framework for pattern recognition in DNA sequences.
1501:. After motif representation, an objective function is chosen and a suitable search algorithm is applied to uncover the motifs. Finally the post-processing stage involves evaluating the discovered motifs.
1210:(PWM) contains log odds weights for computing a match score. A cutoff is needed to specify whether an input sequence matches the motif or not. PWMs are calculated from PFMs. PWMs are also known as PSSMs.
863:, but does not indicate the likelihood of any particular match. For this reason, two or more patterns are often associated with a single motif: the defining pattern, and various typical patterns.
992:
Different pattern description notations have other ways of forming pattern elements. One of these notations is the PROSITE notation, described in the following subsection.
1516:(MEME) algorithm, which generates statistical information for each candidate. There are more than 100 publications detailing motif discovery algorithms; Weirauch
1497:
then ensures the removal of any confounding elements. Next there is the discovery stage. In this phase sequences are represented using consensus strings or
1026:
A string of characters drawn from the alphabet and enclosed in braces (curly brackets) denotes any amino acid except for those in the string. For example,
1198:
A matrix of numbers containing scores for each residue or nucleotide at each position of a fixed-length motif. There are two types of weight matrices.
1577:: human curators would select a pool of sequences known to be related and use computer programs to align them and produce the motif profile (Pfam uses
189:
182:
950:
The fundamental idea behind all these notations is the matching principle, which assigns a meaning to a sequence of elements of the pattern notation:
1671:, epitomized by FMGA and MDGA, navigate motif search through genetic operators and specialized strategies. Harnessing swarm intelligence principles,
288:
530:
bind DNA in only its double-helical form. They are able to recognize motifs through contact with the double helix's major or minor groove.
1566:
indicates one member of a closely related family of amino acids. The authors were able to show that the motif has DNA binding activity.
1512:
There are software programs which, given multiple input sequences, attempt to identify one or more candidate motifs. One example is the
931:
any string of characters drawn from the alphabet enclosed in square brackets matches any one of the corresponding amino acids; e.g.
2204:
1536:
approach and studying similar genes in different species. For example, by aligning the amino acid sequences specified by the GCM (
2640:
Schiller MR (2007). "Minimotif miner: a computational tool to investigate protein function, disease, and genetic diversity".
2630:
2474:
232:
130:
204:
102:
884:
signifies any amino acid, and the square brackets indicate an alternative (see below for further details about notation).
2661:
211:
109:
1840:
1513:
750:
732:
670:
609:
443:
324:
306:
251:
149:
52:
1596:
In 2017, MotifHyades has been developed as a motif discovery tool that can be directly applied to paired sequences.
714:
652:
425:
1743:
motif, but their amino acid sequences do not show much similarity, as shown in the table below. In 1997, Matsuda,
1691:
This chart shows many different types of algorithms used in the discovery of sequence motifs and their categories
1676:
644:
595:
417:
83:
38:
218:
116:
699:
648:
421:
87:
1581:, which can be used to identify other related proteins. A phylogenic approach can also be used to enhance the
917:
Several notations for describing motifs are in use but most of them are variants of standard notations for
200:
98:
2500:"An approach to detection of protein structural motifs using an encoding scheme of backbone conformations"
1672:
1654:
taking center stage. LOGOS and BaMM, exemplifying this cohort, intricately weave
Bayesian approaches and
1008:
one-letter codes and conforms to the above description with the exception that a concatenation symbol, '
925:
there is an alphabet of single characters, each denoting a specific amino acid or a set of amino acids;
1012:', is used between pattern elements, but it is often dropped between letters of the pattern alphabet.
380:
Asn, followed by anything but Pro, followed by either Ser or Thr, followed by anything but Pro residue
2613:
Altarawy D, Ismail MA, Ghanem S (2009). "MProfiler: A Profile-Based Method for DNA Motif
Discovery".
1850:
1545:
2499:
928:
a string of characters drawn from the alphabet denotes a sequence of the corresponding amino acids;
633:
406:
2738:
710:
637:
410:
178:
76:
2290:"MotifHyades: expectation maximization for de novo DNA motif pair discovery on paired sequences"
1825:
1498:
1207:
772:
Asn, followed by anything but Pro, followed by either Ser or Thr, followed by anything but Pro
1845:
557:
489:
2386:
Miller, Andrew K.; Print, Cristin G.; Nielsen, Poul M. F.; Crampin, Edmund J. (2010-11-18).
1667:
A distinct category unfolds, wherein algorithms draw inspiration from the biological realm.
337:
225:
123:
2399:
2342:
2242:
2158:
1927:
1521:
1048:
If a pattern is restricted to the C-terminal of a sequence, the pattern is suffixed with '
1041:
If a pattern is restricted to the N-terminal of a sequence, the pattern is prefixed with '
902:
is sometimes equated with the IQ motif itself, but a more accurate description would be a
8:
1612:
1608:
1578:
1015:
PROSITE allows the following pattern elements in addition to those described previously:
837:
601:
523:
505:
481:
368:
44:
2403:
2346:
2246:
2162:
1931:
706:
284:
2716:
2675:
2558:
2533:
2480:
2461:. GECCO '05. New York, NY, USA: Association for Computing Machinery. pp. 447–452.
2430:
2387:
2363:
2330:
2265:
2230:
2122:
2097:
2096:
Weirauch MT, Cote A, Norel R, Annala M, Zhao Y, Riley TR, et al. (February 2013).
2073:
2048:
2024:
1999:
1953:
1880:
1875:
918:
904:
565:
477:
361:
832:
occurring in the pattern. Observed probabilities can be graphically represented using
2708:
2667:
2657:
2626:
2592:
2588:
2563:
2510:
2470:
2435:
2417:
2368:
2311:
2270:
2186:
2181:
2146:
2127:
2078:
2029:
2011:
1971:
1966:
1945:
1915:
1731:
1717:
1668:
1651:
553:
2720:
2679:
2306:
2289:
552:
of sequences, researchers search and find motifs using computer-based techniques of
2700:
2687:
Balla S, Thapar V, Verma S, Luong T, Faghri T, Huang CH, et al. (March 2006).
2649:
2618:
2584:
2553:
2545:
2484:
2462:
2425:
2407:
2358:
2350:
2301:
2260:
2250:
2176:
2166:
2117:
2109:
2068:
2060:
2019:
1961:
1935:
1890:
1870:
1753:
1650:
Evolving further, advanced motif discovery embraces sophisticated techniques, with
527:
518:. Some of these are believed to affect the shape of nucleic acids (see for example
1687:
533:
Short coding motifs, which appear to lack secondary structure, include those that
2688:
2653:
2412:
2255:
1865:
1855:
542:
372:
2622:
2459:
Proceedings of the 7th annual conference on
Genetic and evolutionary computation
2354:
2151:
Proceedings of the
National Academy of Sciences of the United States of America
1920:
Proceedings of the
National Academy of Sciences of the United States of America
1885:
1817:
1570:
561:
538:
534:
480:
of the protein. Nevertheless, motifs need not be associated with a distinctive
2147:"The gcm-motif: a novel DNA-binding motif conserved in Drosophila and mammals"
2098:"Evaluation of methods for modeling transcription factor sequence specificity"
2732:
2575:
Stormo GD (January 2000). "DNA binding sites: representation and discovery".
2421:
2171:
2015:
1949:
1860:
1749:
1684:
adaptability of these algorithms in the intricate domain of motif discovery.
1680:
836:. Sometimes patterns are defined in terms of a probabilistic model such as a
833:
515:
485:
469:
342:
2466:
496:
with such motifs need not deviate from the typical shape (e.g. the "B-form"
2712:
2671:
2596:
2567:
2439:
2372:
2315:
2274:
2131:
2082:
2033:
1940:
1655:
1533:
1524:
is another motif discovery method that is based on combinatorial approach.
1488:
519:
493:
465:
2514:
2454:
2190:
1975:
2064:
1548:
in 1996. It spans about 150 amino acid residues, and begins as follows:
1179:
2231:"PhyloGibbs: a Gibbs sampling motif finder that incorporates phylogeny"
358:
354:
1735:
1721:
1059:' can also occur inside a terminating square bracket pattern, so that
2704:
2617:. Lecture Notes in Computer Science. Vol. 5780. pp. 13–23.
2113:
1957:
1835:
2507:
Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
717:. Statements consisting only of original research should be removed.
622:
395:
167:
65:
1811:
1215:
867:
549:
511:
2549:
2049:"MEME: discovering and analyzing DNA and protein sequence motifs"
1707:
1001:
473:
364:
1544:, Akiyama and others discovered a pattern which they called the
1480:
issues and the data-intensive computational scalability issues.
1712:
2534:"Viral infection and human disease--insights from minimotifs"
2453:
Che, Dongsheng; Song, Yinglei; Rasheed, Khaled (2005-06-25).
2228:
1998:
Hashim, Fatma A.; Mabrouk, Mai S.; Al-Atabany, Walid (2019).
1520:. evaluated many related algorithms in a 2013 benchmark. The
1005:
2689:"Minimotif Miner: a tool for investigating protein function"
1023:' can be used as a pattern element to denote any amino acid.
1830:
1574:
461:
457:
2229:
Siddharthan R, Siggia ED, van Nimwegen E (December 2005).
522:), but this is only sometimes the case. For example, many
2497:
2385:
1599:
497:
2145:
Akiyama Y, Hosoya T, Poole AM, Hotta Y (December 1996).
367:
that is widespread and usually assumed to be related to
2331:"DNA Motif Recognition Modeling from Protein Sequences"
2144:
2095:
2000:"Review of Different Sequence Motif Finding Algorithms"
1997:
2046:
965:
matches the six amino acid sequences corresponding to
2686:
1553:
WDIND*.*P..*...D.F.*W***.**.IYS**...A.*H*S*WAMRNTNNHN
2612:
2531:
2047:
Bailey TL, Williams N, Misleh C, Li WW (July 2006).
1807:
1611:
approach has been proposed to infer DNA motifs from
1492:
A flowchart depicting the process of motif discovery
824:does not give any indication of the probability of
279:
may be too technical for most readers to understand
90:. Unsourced material may be challenged and removed.
1916:"The Effects of Sequence Context on DNA Curvature"
1585:MEME algorithm, with PhyloGibbs being an example.
2455:"MDGA: Motif discovery using a genetic algorithm"
1562:signifies a single amino acid or a gap, and each
843:
2730:
1913:
1700:
912:
898:. Since the last choice is so wide, the pattern
1993:
1991:
1989:
1987:
1985:
1588:
1527:
537:proteins for delivery to particular parts of a
2452:
2388:"A Bayesian search for transcriptional motifs"
2222:
1914:Dlakić, Mensur; Harrington, Rodney E. (1996).
1569:A similar approach is commonly used by modern
934:matches any of the amino acids represented by
560:. Such techniques belong to the discipline of
1532:Motifs have also been discovered by taking a
2532:Kadaveru K, Vyas J, Schiller MR (May 2008).
2498:Matsuda H, Taniguchi F, Hashimoto A (1997).
1982:
1756:. "W" always corresponds to an alpha helix.
1218:database for the transcription factor AP-1:
2639:
2491:
2322:
2281:
2089:
995:
866:For example, the defining sequence for the
767:-glycosylation site motif mentioned above:
651:. Unsourced material may be challenged and
610:Learn how and when to remove these messages
424:. Unsourced material may be challenged and
53:Learn how and when to remove these messages
2138:
1504:
2557:
2429:
2411:
2362:
2305:
2264:
2254:
2180:
2170:
2121:
2072:
2040:
2023:
2004:Avicenna Journal of Medical Biotechnology
1965:
1939:
1663:Nature-Inspired and Heuristic Algorithms:
751:Learn how and when to remove this message
733:Learn how and when to remove this message
671:Learn how and when to remove this message
476:; that is a stereotypical element of the
444:Learn how and when to remove this message
325:Learn how and when to remove this message
307:Learn how and when to remove this message
291:, without removing the technical details.
252:Learn how and when to remove this message
150:Learn how and when to remove this message
16:Nucleotide or amino-acid sequence pattern
1686:
1487:
336:
1499:Position-specific Weight Matrices (PWM)
571:
2731:
2574:
887:Usually, however, the first letter is
371:of the macromolecule. For example, an
341:A DNA sequence motif represented as a
188:Please improve this article by adding
2615:Pattern Recognition in Bioinformatics
456:When a sequence motif appears in the
289:make it understandable to non-experts
2642:Current Protocols in Protein Science
2605:
2328:
2287:
682:
649:adding citations to reliable sources
616:
575:
422:adding citations to reliable sources
389:
263:
161:
88:adding citations to reliable sources
59:
18:
1114:is equivalent to the repetition of
1100:is equivalent to the repetition of
503:Outside of gene exons, there exist
13:
2524:
1901:
1469:
1161:matches any sequence that matches
1030:denotes any amino acid other than
14:
2750:
1841:Multiple EM for Motif Elicitation
1673:Particle Swarm Optimization (PSO)
1514:Multiple EM for Motif Elicitation
591:This section has multiple issues.
34:This article has multiple issues.
1810:
1758:
1188:C-x(2,4)-C-x(3)--x(8)-H-x(3,5)-H
687:
621:
580:
526:that have affinity for specific
394:
268:
166:
64:
23:
2446:
2379:
1177:The signature of the C2H2-type
777:This pattern may be written as
599:or discuss these issues on the
75:needs additional citations for
42:or discuss these issues on the
2589:10.1093/bioinformatics/16.1.16
2197:
1907:
1695:
1603:motif recognition from protein
1086:are two decimal integers with
844:Motifs and consensus sequences
1:
2307:10.1093/bioinformatics/btx381
2059:(Web Server issue): W369-73.
1896:
1752:between alpha-carbons of the
1739: chain A) both have a
1701:Three-dimensional chain codes
1474:
1214:An example of a PFM from the
913:Pattern description notations
378:site motif can be defined as
190:secondary or tertiary sources
2654:10.1002/0471140864.ps0212s48
2648:(1). Wiley: 2.12.1–2.12.14.
2413:10.1371/journal.pone.0013897
2256:10.1371/journal.pcbi.0010067
1528:Phylogenetic motif discovery
803:means any amino acid except
7:
2623:10.1007/978-3-642-04031-3_2
1803:
1729:catabolite gene activator (
1677:Artificial Bee Colony (ABC)
1193:
921:and use these conventions:
713:the claims made and adding
385:
345:for the LexA-binding motif.
10:
2755:
2355:10.1016/j.isci.2018.09.003
2329:Wong KC (September 2018).
2235:PLOS Computational Biology
1619:Motif Discovery Algorithms
1078:is a pattern element, and
1851:Protein primary structure
1540:) gene in man, mouse and
2288:Wong KC (October 2017).
2172:10.1073/pnas.93.25.14912
996:PROSITE pattern notation
2538:Frontiers in Bioscience
2467:10.1145/1068009.1068080
1669:Genetic Algorithms (GA)
1637:Probabilistic Approach:
1019:The lower case letter '
510:and motifs within the "
2053:Nucleic Acids Research
1941:10.1073/pnas.93.9.3847
1831:Mammalian Motif Finder
1826:Biomolecular structure
1692:
1493:
1208:position weight matrix
1122:times for any integer
346:
177:relies excessively on
1846:Nucleic acid sequence
1725: chain A) and
1690:
1628:Enumerative Approach:
1491:
548:Within a sequence or
340:
2102:Nature Biotechnology
1767:Amino acid sequence
1592:motif pair discovery
1522:planted motif search
1484:Process of discovery
870:may be taken to be:
645:improve this section
572:Motif Representation
524:DNA binding proteins
488:" sequences are not
418:improve this section
84:improve this article
2404:2010PLoSO...513897M
2347:2018iSci....7..198W
2247:2005PLSCB...1...67S
2205:"Modelling in Pfam"
2163:1996PNAS...9314912A
1932:1996PNAS...93.3847D
1796:RQEIGQIVGCSRETVGRIL
1791:KWWWWWWGKCFKWWWWWWW
1781:LYDVAEYAGVSYQTVSRVV
1776:TWWWWWWWKCLKWWWWWWG
1613:DNA-binding domains
1609:Markov random field
1538:glial cells missing
919:regular expressions
894:choices resolve to
838:hidden Markov model
541:, or mark them for
506:regulatory sequence
492:into proteins, and
482:secondary structure
369:biological function
2065:10.1093/nar/gkl198
1881:Conserved sequence
1876:Short linear motif
1693:
1681:Cuckoo Search (CS)
1646:Advanced Approach:
1573:databases such as
1494:
1004:notation uses the
905:consensus sequence
698:possibly contains
566:consensus sequence
347:
2632:978-3-642-04030-6
2476:978-1-59593-010-1
2300:(19): 3028–3035.
1801:
1800:
1652:Bayesian modeling
1463:
1462:
1151:is equivalent to
961:Thus the pattern
761:
760:
753:
743:
742:
735:
700:original research
681:
680:
673:
614:
554:sequence analysis
528:DNA binding sites
520:RNA self-splicing
478:overall structure
454:
453:
446:
335:
334:
327:
317:
316:
309:
262:
261:
254:
236:
160:
159:
152:
134:
57:
2746:
2724:
2705:10.1038/nmeth856
2683:
2636:
2600:
2571:
2561:
2519:
2518:
2504:
2495:
2489:
2488:
2450:
2444:
2443:
2433:
2415:
2383:
2377:
2376:
2366:
2326:
2320:
2319:
2309:
2285:
2279:
2278:
2268:
2258:
2226:
2220:
2219:
2217:
2215:
2201:
2195:
2194:
2184:
2174:
2142:
2136:
2135:
2125:
2114:10.1038/nbt.2486
2093:
2087:
2086:
2076:
2044:
2038:
2037:
2027:
1995:
1980:
1979:
1969:
1943:
1926:(9): 3847–3852.
1911:
1891:Structural motif
1871:Structural motif
1820:
1815:
1814:
1797:
1792:
1782:
1777:
1759:
1754:protein backbone
1741:helix-turn-helix
1738:
1724:
1715:repressor LacI (
1679:algorithms, and
1565:
1561:
1554:
1221:
1220:
1189:
1172:
1168:
1164:
1160:
1154:
1150:
1137:
1133:
1129:
1125:
1121:
1117:
1113:
1107:
1103:
1099:
1093:
1089:
1085:
1081:
1077:
1070:
1066:
1062:
1058:
1051:
1044:
1037:
1033:
1029:
1022:
1011:
988:
984:
980:
976:
972:
968:
964:
945:
941:
937:
933:
907:for the IQ motif
901:
897:
893:
890:
883:
876:
862:
858:
854:
850:
831:
827:
823:
817:
813:
809:
806:
802:
798:
794:
790:
786:
781:
756:
749:
738:
731:
727:
724:
718:
715:inline citations
691:
690:
683:
676:
669:
665:
662:
656:
625:
617:
606:
584:
583:
576:
498:DNA double helix
470:structural motif
449:
442:
438:
435:
429:
398:
390:
330:
323:
312:
305:
301:
298:
292:
272:
271:
264:
257:
250:
246:
243:
237:
235:
201:"Sequence motif"
194:
170:
162:
155:
148:
144:
141:
135:
133:
99:"Sequence motif"
92:
68:
60:
49:
27:
26:
19:
2754:
2753:
2749:
2748:
2747:
2745:
2744:
2743:
2729:
2728:
2727:
2664:
2633:
2608:
2606:Primary sources
2603:
2544:(13): 6455–71.
2527:
2525:Further reading
2522:
2502:
2496:
2492:
2477:
2451:
2447:
2384:
2380:
2327:
2323:
2286:
2282:
2227:
2223:
2213:
2211:
2203:
2202:
2198:
2157:(25): 14912–6.
2143:
2139:
2094:
2090:
2045:
2041:
1996:
1983:
1912:
1908:
1904:
1902:Primary sources
1899:
1866:Sequence mining
1856:Protein I-sites
1816:
1809:
1806:
1795:
1790:
1780:
1775:
1730:
1716:
1703:
1698:
1605:
1594:
1563:
1559:
1552:
1542:D. melanogaster
1530:
1510:
1508:motif discovery
1477:
1472:
1470:Motif Discovery
1196:
1187:
1170:
1166:
1162:
1158:
1152:
1148:
1144:Some examples:
1135:
1131:
1127:
1123:
1119:
1115:
1111:
1105:
1101:
1097:
1091:
1087:
1083:
1079:
1075:
1068:
1064:
1060:
1056:
1055:The character '
1049:
1042:
1035:
1031:
1027:
1020:
1009:
998:
986:
982:
978:
974:
970:
966:
962:
943:
939:
935:
932:
915:
899:
895:
892:
888:
881:
874:
860:
856:
852:
849:
846:
829:
825:
822:
815:
811:
808:
804:
800:
796:
792:
788:
784:
779:
757:
746:
745:
744:
739:
728:
722:
719:
704:
692:
688:
677:
666:
660:
657:
642:
626:
585:
581:
574:
543:phosphorylation
450:
439:
433:
430:
415:
399:
388:
331:
320:
319:
318:
313:
302:
296:
293:
285:help improve it
282:
273:
269:
258:
247:
241:
238:
195:
193:
187:
183:primary sources
171:
156:
145:
139:
136:
93:
91:
81:
69:
28:
24:
17:
12:
11:
5:
2752:
2742:
2741:
2739:Bioinformatics
2726:
2725:
2693:Nature Methods
2684:
2663:978-0471140863
2662:
2637:
2631:
2609:
2607:
2604:
2602:
2601:
2577:Bioinformatics
2572:
2528:
2526:
2523:
2521:
2520:
2490:
2475:
2445:
2398:(11): e13897.
2378:
2321:
2294:Bioinformatics
2280:
2221:
2196:
2137:
2088:
2039:
2010:(2): 130–148.
1981:
1905:
1903:
1900:
1898:
1895:
1894:
1893:
1888:
1886:Protein domain
1883:
1878:
1873:
1868:
1863:
1858:
1853:
1848:
1843:
1838:
1833:
1828:
1822:
1821:
1818:Biology portal
1805:
1802:
1799:
1798:
1793:
1788:
1784:
1783:
1778:
1773:
1769:
1768:
1765:
1762:
1750:torsion angles
1702:
1699:
1697:
1694:
1604:
1598:
1593:
1587:
1571:protein domain
1556:
1555:
1529:
1526:
1509:
1503:
1476:
1473:
1471:
1468:
1461:
1460:
1457:
1454:
1451:
1448:
1445:
1441:
1440:
1437:
1434:
1431:
1428:
1425:
1421:
1420:
1417:
1414:
1411:
1408:
1405:
1401:
1400:
1397:
1394:
1391:
1388:
1385:
1381:
1380:
1377:
1374:
1371:
1368:
1365:
1361:
1360:
1357:
1354:
1351:
1348:
1345:
1341:
1340:
1337:
1334:
1331:
1328:
1325:
1321:
1320:
1317:
1314:
1311:
1308:
1305:
1301:
1300:
1297:
1294:
1291:
1288:
1285:
1281:
1280:
1277:
1274:
1271:
1268:
1265:
1261:
1260:
1257:
1254:
1251:
1248:
1245:
1241:
1240:
1237:
1234:
1231:
1228:
1225:
1212:
1211:
1204:
1195:
1192:
1191:
1190:
1175:
1174:
1156:
1142:
1141:
1140:
1139:
1109:
1072:
1063:matches both "
1053:
1046:
1039:
1024:
997:
994:
959:
958:
948:
947:
929:
926:
914:
911:
878:
877:
845:
842:
834:sequence logos
775:
774:
759:
758:
741:
740:
695:
693:
686:
679:
678:
629:
627:
620:
615:
589:
588:
586:
579:
573:
570:
562:bioinformatics
452:
451:
402:
400:
393:
387:
384:
376:-glycosylation
351:sequence motif
349:In biology, a
333:
332:
315:
314:
276:
274:
267:
260:
259:
174:
172:
165:
158:
157:
72:
70:
63:
58:
32:
31:
29:
22:
15:
9:
6:
4:
3:
2:
2751:
2740:
2737:
2736:
2734:
2722:
2718:
2714:
2710:
2706:
2702:
2698:
2694:
2690:
2685:
2681:
2677:
2673:
2669:
2665:
2659:
2655:
2651:
2647:
2643:
2638:
2634:
2628:
2624:
2620:
2616:
2611:
2610:
2598:
2594:
2590:
2586:
2582:
2578:
2573:
2569:
2565:
2560:
2555:
2551:
2547:
2543:
2539:
2535:
2530:
2529:
2516:
2512:
2508:
2501:
2494:
2486:
2482:
2478:
2472:
2468:
2464:
2460:
2456:
2449:
2441:
2437:
2432:
2427:
2423:
2419:
2414:
2409:
2405:
2401:
2397:
2393:
2389:
2382:
2374:
2370:
2365:
2360:
2356:
2352:
2348:
2344:
2340:
2336:
2332:
2325:
2317:
2313:
2308:
2303:
2299:
2295:
2291:
2284:
2276:
2272:
2267:
2262:
2257:
2252:
2248:
2244:
2240:
2236:
2232:
2225:
2210:
2206:
2200:
2192:
2188:
2183:
2178:
2173:
2168:
2164:
2160:
2156:
2152:
2148:
2141:
2133:
2129:
2124:
2119:
2115:
2111:
2108:(2): 126–34.
2107:
2103:
2099:
2092:
2084:
2080:
2075:
2070:
2066:
2062:
2058:
2054:
2050:
2043:
2035:
2031:
2026:
2021:
2017:
2013:
2009:
2005:
2001:
1994:
1992:
1990:
1988:
1986:
1977:
1973:
1968:
1963:
1959:
1955:
1951:
1947:
1942:
1937:
1933:
1929:
1925:
1921:
1917:
1910:
1906:
1892:
1889:
1887:
1884:
1882:
1879:
1877:
1874:
1872:
1869:
1867:
1864:
1862:
1861:Sequence logo
1859:
1857:
1854:
1852:
1849:
1847:
1844:
1842:
1839:
1837:
1834:
1832:
1829:
1827:
1824:
1823:
1819:
1813:
1808:
1794:
1789:
1786:
1785:
1779:
1774:
1771:
1770:
1766:
1764:3D chain code
1763:
1761:
1760:
1757:
1755:
1751:
1746:
1742:
1737:
1733:
1728:
1723:
1719:
1714:
1710:
1709:
1689:
1685:
1682:
1678:
1674:
1670:
1665:
1664:
1660:
1657:
1656:Markov models
1653:
1648:
1647:
1643:
1639:
1638:
1634:
1630:
1629:
1625:
1621:
1620:
1616:
1615:of proteins.
1614:
1610:
1602:
1597:
1591:
1586:
1584:
1580:
1576:
1572:
1567:
1551:
1550:
1549:
1547:
1543:
1539:
1535:
1525:
1523:
1519:
1515:
1507:
1502:
1500:
1490:
1486:
1485:
1481:
1467:
1458:
1455:
1452:
1449:
1446:
1443:
1442:
1438:
1435:
1432:
1429:
1426:
1423:
1422:
1418:
1415:
1412:
1409:
1406:
1403:
1402:
1398:
1395:
1392:
1389:
1386:
1383:
1382:
1378:
1375:
1372:
1369:
1366:
1363:
1362:
1358:
1355:
1352:
1349:
1346:
1343:
1342:
1338:
1335:
1332:
1329:
1326:
1323:
1322:
1318:
1315:
1312:
1309:
1306:
1303:
1302:
1298:
1295:
1292:
1289:
1286:
1283:
1282:
1278:
1275:
1272:
1269:
1266:
1263:
1262:
1258:
1255:
1252:
1249:
1246:
1243:
1242:
1238:
1235:
1232:
1229:
1226:
1223:
1222:
1219:
1217:
1209:
1205:
1201:
1200:
1199:
1186:
1185:
1184:
1182:
1181:
1157:
1147:
1146:
1145:
1110:
1096:
1095:
1073:
1054:
1047:
1040:
1025:
1018:
1017:
1016:
1013:
1007:
1003:
993:
990:
957:
953:
952:
951:
930:
927:
924:
923:
922:
920:
910:
908:
906:
885:
873:
872:
871:
869:
864:
848:The notation
841:
839:
835:
821:The notation
819:
810:means either
782:
773:
770:
769:
768:
766:
763:Consider the
755:
752:
737:
734:
726:
716:
712:
708:
702:
701:
696:This article
694:
685:
684:
675:
672:
664:
654:
650:
646:
640:
639:
635:
630:This section
628:
624:
619:
618:
613:
611:
604:
603:
598:
597:
592:
587:
578:
577:
569:
567:
563:
559:
555:
551:
546:
544:
540:
536:
531:
529:
525:
521:
517:
516:satellite DNA
513:
509:
507:
501:
499:
495:
494:nucleic acids
491:
487:
483:
479:
475:
471:
467:
463:
459:
448:
445:
437:
427:
423:
419:
413:
412:
408:
403:This section
401:
397:
392:
391:
383:
381:
377:
375:
370:
366:
363:
360:
356:
352:
344:
343:sequence logo
339:
329:
326:
311:
308:
300:
290:
286:
280:
277:This article
275:
266:
265:
256:
253:
245:
234:
231:
227:
224:
220:
217:
213:
210:
206:
203: –
202:
198:
197:Find sources:
191:
185:
184:
180:
175:This article
173:
169:
164:
163:
154:
151:
143:
132:
129:
125:
122:
118:
115:
111:
108:
104:
101: –
100:
96:
95:Find sources:
89:
85:
79:
78:
73:This article
71:
67:
62:
61:
56:
54:
47:
46:
41:
40:
35:
30:
21:
20:
2699:(3): 175–7.
2696:
2692:
2645:
2641:
2614:
2583:(1): 16–23.
2580:
2576:
2550:10.2741/3166
2541:
2537:
2506:
2493:
2458:
2448:
2395:
2391:
2381:
2338:
2334:
2324:
2297:
2293:
2283:
2238:
2234:
2224:
2212:. Retrieved
2208:
2199:
2154:
2150:
2140:
2105:
2101:
2091:
2056:
2052:
2042:
2007:
2003:
1923:
1919:
1909:
1744:
1740:
1726:
1706:
1704:
1666:
1662:
1661:
1649:
1645:
1644:
1640:
1636:
1635:
1631:
1627:
1626:
1622:
1618:
1617:
1606:
1600:
1595:
1589:
1582:
1568:
1557:
1541:
1537:
1534:phylogenetic
1531:
1517:
1511:
1505:
1495:
1483:
1482:
1478:
1464:
1213:
1197:
1178:
1176:
1143:
1126:satisfying:
1014:
999:
991:
960:
954:
949:
916:
903:
886:
879:
865:
847:
820:
778:
776:
771:
764:
762:
747:
729:
720:
697:
667:
658:
643:Please help
631:
607:
600:
594:
593:Please help
590:
547:
532:
504:
502:
455:
440:
431:
416:Please help
404:
379:
373:
350:
348:
321:
303:
294:
278:
248:
239:
229:
222:
215:
208:
196:
176:
146:
137:
127:
120:
113:
106:
94:
82:Please help
77:verification
74:
50:
43:
37:
36:Please help
33:
2341:: 198–211.
2214:14 December
1696:Motif Cases
1607:In 2018, a
1183:domain is:
1180:zinc finger
900:IQxxxRGxxxR
891:, and both
564:. See also
514:", such as
297:August 2020
2509:: 280–91.
2241:(7): e67.
1897:References
1558:Here each
875:QxxxGxxxxx
723:March 2020
707:improve it
661:March 2020
596:improve it
556:, such as
490:translated
434:March 2020
359:amino-acid
355:nucleotide
242:March 2020
212:newspapers
179:references
140:March 2020
110:newspapers
39:improve it
2422:1932-6203
2016:2008-2835
1950:0027-8424
1836:MochiView
1546:GCM motif
711:verifying
632:does not
602:talk page
486:Noncoding
464:, it may
405:does not
45:talk page
2733:Category
2721:15571142
2713:16489333
2680:10406520
2672:18429315
2597:10812473
2568:18508672
2440:21124986
2392:PLOS ONE
2373:30267681
2335:iScience
2316:28633280
2275:16477324
2132:23354101
2083:16845028
2034:31057715
1804:See also
1711:lactose
1475:Overview
1216:TRANSFAC
1194:Matrices
1118:exactly
1104:exactly
1094:, then:
868:IQ motif
550:database
386:Overview
362:sequence
2559:2628544
2515:9390299
2485:7892935
2431:2987817
2400:Bibcode
2364:6153143
2343:Bibcode
2266:1309704
2243:Bibcode
2191:8962155
2159:Bibcode
2123:3687085
2074:1538909
2025:6490410
1976:8632978
1928:Bibcode
1727:E. coli
1708:E. coli
1601:De novo
1590:De novo
1583:de novo
1506:De novo
1203:models.
1171:x-x-x-x
1067:" and "
1002:PROSITE
799:= Thr;
795:= Ser,
791:= Pro,
787:= Asn,
780:N{P}{P}
705:Please
653:removed
638:sources
474:protein
472:" of a
426:removed
411:sources
365:pattern
283:Please
226:scholar
124:scholar
2719:
2711:
2678:
2670:
2660:
2629:
2595:
2566:
2556:
2513:
2483:
2473:
2438:
2428:
2420:
2371:
2361:
2314:
2273:
2263:
2189:
2179:
2130:
2120:
2081:
2071:
2032:
2022:
2014:
1974:
1964:
1956:
1948:
1787:3gapA
1772:1lccA
1745:et al.
1713:operon
1239:IUPAC
1159:x(2,4)
1134:<=
1130:<=
1112:e(m,n)
1108:times;
1090:<=
985:, and
880:where
851:means
807:; and
783:where
508:motifs
466:encode
228:
221:
214:
207:
199:
126:
119:
112:
105:
97:
2717:S2CID
2676:S2CID
2503:(PDF)
2481:S2CID
2182:26236
1967:39447
1958:39155
1954:JSTOR
1518:et al
1167:x-x-x
1153:x-x-x
1069:S>
1006:IUPAC
956:turn.
558:BLAST
535:label
468:the "
460:of a
353:is a
233:JSTOR
219:books
131:JSTOR
117:books
2709:PMID
2668:PMID
2658:ISBN
2627:ISBN
2593:PMID
2564:PMID
2511:PMID
2471:ISBN
2436:PMID
2418:ISSN
2369:PMID
2312:PMID
2271:PMID
2216:2023
2209:Pfam
2187:PMID
2128:PMID
2079:PMID
2030:PMID
2012:ISSN
1972:PMID
1946:ISSN
1736:3gap
1722:1lcc
1705:The
1579:HMMs
1575:Pfam
1149:x(3)
1098:e(m)
1082:and
1057:>
1050:>
1043:<
1028:{ST}
1000:The
636:any
634:cite
539:cell
512:junk
462:gene
458:exon
409:any
407:cite
205:news
103:news
2701:doi
2650:doi
2619:doi
2585:doi
2554:PMC
2546:doi
2463:doi
2426:PMC
2408:doi
2359:PMC
2351:doi
2302:doi
2261:PMC
2251:doi
2177:PMC
2167:doi
2118:PMC
2110:doi
2069:PMC
2061:doi
2020:PMC
1962:PMC
1936:doi
1732:PDB
1718:PDB
1224:Pos
1169:or
1165:or
1163:x-x
1074:If
1034:or
987:BEF
983:BDF
979:BCF
975:AEF
971:ADF
967:ACF
942:or
938:or
859:or
855:or
828:or
814:or
801:{X}
709:by
647:by
500:).
484:. "
420:by
357:or
287:to
181:to
86:by
2735::
2715:.
2707:.
2695:.
2691:.
2674:.
2666:.
2656:.
2646:48
2644:.
2625:.
2591:.
2581:16
2579:.
2562:.
2552:.
2542:13
2540:.
2536:.
2505:.
2479:.
2469:.
2457:.
2434:.
2424:.
2416:.
2406:.
2394:.
2390:.
2367:.
2357:.
2349:.
2337:.
2333:.
2310:.
2298:33
2296:.
2292:.
2269:.
2259:.
2249:.
2237:.
2233:.
2207:.
2185:.
2175:.
2165:.
2155:93
2153:.
2149:.
2126:.
2116:.
2106:31
2104:.
2100:.
2077:.
2067:.
2057:34
2055:.
2051:.
2028:.
2018:.
2008:11
2006:.
2002:.
1984:^
1970:.
1960:.
1952:.
1944:.
1934:.
1924:93
1922:.
1918:.
1734::
1720::
1675:,
1459:W
1444:11
1439:N
1424:10
1419:M
1404:09
1399:N
1384:08
1379:T
1364:07
1359:C
1350:16
1344:06
1339:A
1327:17
1324:05
1319:G
1313:17
1304:04
1299:T
1296:17
1284:03
1279:S
1264:02
1259:R
1244:01
1206:A
1071:".
1065:ST
1052:'.
1045:'.
989:.
981:,
977:,
973:,
969:,
909:.
840:.
818:.
605:.
568:.
545:.
382:.
192:.
48:.
2723:.
2703::
2697:3
2682:.
2652::
2635:.
2621::
2599:.
2587::
2570:.
2548::
2517:.
2487:.
2465::
2442:.
2410::
2402::
2396:5
2375:.
2353::
2345::
2339:7
2318:.
2304::
2277:.
2253::
2245::
2239:1
2218:.
2193:.
2169::
2161::
2134:.
2112::
2085:.
2063::
2036:.
1978:.
1938::
1930::
1564:*
1560:.
1456:7
1453:1
1450:3
1447:6
1436:3
1433:7
1430:3
1427:4
1416:1
1413:1
1410:6
1407:9
1396:4
1393:2
1390:7
1387:4
1376:9
1373:3
1370:2
1367:3
1356:1
1353:0
1347:0
1336:0
1333:0
1330:0
1316:0
1310:0
1307:0
1293:0
1290:0
1287:0
1276:0
1273:9
1270:5
1267:3
1256:1
1253:8
1250:2
1247:6
1236:T
1233:G
1230:C
1227:A
1173:.
1155:.
1138:.
1136:n
1132:k
1128:m
1124:k
1120:k
1116:e
1106:m
1102:e
1092:n
1088:m
1084:n
1080:m
1076:e
1061:S
1038:.
1036:T
1032:S
1021:x
1010:-
963:F
946:.
944:c
940:b
936:a
896:R
889:I
882:x
861:Z
857:Y
853:X
830:Y
826:X
816:Y
812:X
805:X
797:T
793:S
789:P
785:N
765:N
754:)
748:(
736:)
730:(
725:)
721:(
703:.
674:)
668:(
663:)
659:(
655:.
641:.
612:)
608:(
447:)
441:(
436:)
432:(
428:.
414:.
374:N
328:)
322:(
310:)
304:(
299:)
295:(
281:.
255:)
249:(
244:)
240:(
230:·
223:·
216:·
209:·
186:.
153:)
147:(
142:)
138:(
128:·
121:·
114:·
107:·
80:.
55:)
51:(
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.