Sequence motif - Knowledge

338: 1688: 1489: 1812: 623: 396: 66: 168: 689: 582: 25: 270: 1633:

Complementing these, Clustering-Based Methods such as CisFinder employ nucleotide substitution matrices for motif clustering, effectively mitigating redundancy. Concurrently, Tree-Based Methods like Weeder and FMotif exploit tree structures, and Graph Theoretic-Based Methods (e.g., WINNOWER) employ graph representations, demonstrating the richness of enumeration strategies.

1632:

Initiating the motif discovery journey, the enumerative approach witnesses algorithms meticulously generating and evaluating potential motifs. Pioneering this domain are Simple Word Enumeration techniques, such as YMF and DREME, which systematically go through the sequence in search of short motifs.

1496:

Motif discovery happens in three major phases. A pre-processing stage where sequences are meticulously prepared in assembly and cleaning steps. Assembly involves selecting sequences that contain the desired motif in large quantities, and extraction of unwanted sequences using clustering. Cleaning

1465:

The first column specifies the position, the second column contains the number of occurrences of A at that position, the third column contains the number of occurrences of C at that position, the fourth column contains the number of occurrences of G at that position, the fifth column contains the

1683:

algorithms, featured in GAEM, GARP, and MACS, venture into pheromone-based exploration. These algorithms, mirroring nature's adaptability and cooperative dynamics, serve as avant-garde strategies for motif identification. The synthesis of heuristic techniques in hybrid approaches underscores the

1641:

Diverging into the probabilistic realm, this approach capitalizes on probability models to discern motifs within sequences. MEME, a deterministic exemplar, employs Expectation-Maximization for optimizing Position Weight Matrices (PWMs) and unraveling conserved regions in unaligned DNA sequences.

1479:

The sequence motif discovery process has been well-developed since the 1990s. In particular, most of the existing motif discovery research focuses on DNA motifs. With the advances in high-throughput sequencing, such motif discovery problems are challenged by both the sequence pattern degeneracy

1747:

devised a code they called the "three-dimensional chain code" for representing the protein structure as a string of letters. This encoding scheme reveals the similarity between the proteins much more clearly than the amino acid sequence (example from article): The code encodes the

1623:

Motif discovery algorithms use diverse strategies to uncover patterns in DNA sequences. Integrating enumerative, probabilistic, and nature-inspired approaches, demonstrate their adaptability, with the use of multiple methods proving effective in enhancing identification accuracy.

1642:

Contrasting this, stochastic methodologies like Gibbs Sampling initiate motif discovery with random motif position assignments, iteratively refining the predictions. This probabilistic framework adeptly captures the inherent uncertainty associated with motif discovery.

1466:

number of occurrences of T at that position, and the last column contains the IUPAC notation for that position. Note that the sums of occurrences for A, C, G, and T for each row should be equal because the PFM is derived from aggregating several consensus sequences.

1202:

A position frequency matrix (PFM) records the position-dependent frequency of each residue or nucleotide. PFMs can be experimentally determined from SELEX experiments or computationally discovered by tools such as MEME using hidden Markov

955:

a sequence of elements of the pattern notation matches a sequence of amino acids if and only if the latter sequence can be partitioned into subsequences in such a way that each pattern element matches the corresponding subsequence in

1658:

into their fabric for motif identification. The incorporation of Bayesian clustering methods enhances the probabilistic foundation, providing a holistic framework for pattern recognition in DNA sequences.

1501:. After motif representation, an objective function is chosen and a suitable search algorithm is applied to uncover the motifs. Finally the post-processing stage involves evaluating the discovered motifs. 1210:(PWM) contains log odds weights for computing a match score. A cutoff is needed to specify whether an input sequence matches the motif or not. PWMs are calculated from PFMs. PWMs are also known as PSSMs. 863:, but does not indicate the likelihood of any particular match. For this reason, two or more patterns are often associated with a single motif: the defining pattern, and various typical patterns. 992:

Different pattern description notations have other ways of forming pattern elements. One of these notations is the PROSITE notation, described in the following subsection.

1516:(MEME) algorithm, which generates statistical information for each candidate. There are more than 100 publications detailing motif discovery algorithms; Weirauch 1497:

then ensures the removal of any confounding elements. Next there is the discovery stage. In this phase sequences are represented using consensus strings or

1026:

A string of characters drawn from the alphabet and enclosed in braces (curly brackets) denotes any amino acid except for those in the string. For example,

1198:

A matrix of numbers containing scores for each residue or nucleotide at each position of a fixed-length motif. There are two types of weight matrices.

1577:: human curators would select a pool of sequences known to be related and use computer programs to align them and produce the motif profile (Pfam uses 189: 182: 950:

The fundamental idea behind all these notations is the matching principle, which assigns a meaning to a sequence of elements of the pattern notation:

1671:, epitomized by FMGA and MDGA, navigate motif search through genetic operators and specialized strategies. Harnessing swarm intelligence principles, 288: 530:

bind DNA in only its double-helical form. They are able to recognize motifs through contact with the double helix's major or minor groove.

1566:

indicates one member of a closely related family of amino acids. The authors were able to show that the motif has DNA binding activity.

1512:

There are software programs which, given multiple input sequences, attempt to identify one or more candidate motifs. One example is the

931:

any string of characters drawn from the alphabet enclosed in square brackets matches any one of the corresponding amino acids; e.g.

2204: 1536:

approach and studying similar genes in different species. For example, by aligning the amino acid sequences specified by the GCM (

2640:

Schiller MR (2007). "Minimotif miner: a computational tool to investigate protein function, disease, and genetic diversity".

2630: 2474: 232: 130: 204: 102: 884:

signifies any amino acid, and the square brackets indicate an alternative (see below for further details about notation).

2661: 211: 109: 1840: 1513: 750: 732: 670: 609: 443: 324: 306: 251: 149: 52: 1596:

In 2017, MotifHyades has been developed as a motif discovery tool that can be directly applied to paired sequences.

714: 652: 425: 1743:

motif, but their amino acid sequences do not show much similarity, as shown in the table below. In 1997, Matsuda,

1691:

This chart shows many different types of algorithms used in the discovery of sequence motifs and their categories

1676: 644: 595: 417: 83: 38: 218: 116: 699: 648: 421: 87: 1581:, which can be used to identify other related proteins. A phylogenic approach can also be used to enhance the 917:

Several notations for describing motifs are in use but most of them are variants of standard notations for

200: 98: 2500:"An approach to detection of protein structural motifs using an encoding scheme of backbone conformations" 1672: 1654:

taking center stage. LOGOS and BaMM, exemplifying this cohort, intricately weave Bayesian approaches and

1008:

one-letter codes and conforms to the above description with the exception that a concatenation symbol, '

925:

there is an alphabet of single characters, each denoting a specific amino acid or a set of amino acids;

1012:', is used between pattern elements, but it is often dropped between letters of the pattern alphabet. 380:

Asn, followed by anything but Pro, followed by either Ser or Thr, followed by anything but Pro residue

2613:

Altarawy D, Ismail MA, Ghanem S (2009). "MProfiler: A Profile-Based Method for DNA Motif Discovery".

1850: 1545: 2499: 928:

a string of characters drawn from the alphabet denotes a sequence of the corresponding amino acids;

633: 406: 2738: 710: 637: 410: 178: 76: 2290:"MotifHyades: expectation maximization for de novo DNA motif pair discovery on paired sequences" 1825: 1498: 1207: 772:

Asn, followed by anything but Pro, followed by either Ser or Thr, followed by anything but Pro

1845: 557: 489: 2386:

Miller, Andrew K.; Print, Cristin G.; Nielsen, Poul M. F.; Crampin, Edmund J. (2010-11-18).

1667:

A distinct category unfolds, wherein algorithms draw inspiration from the biological realm.

337: 225: 123: 2399: 2342: 2242: 2158: 1927: 1521: 1048:

If a pattern is restricted to the C-terminal of a sequence, the pattern is suffixed with '

1041:

If a pattern is restricted to the N-terminal of a sequence, the pattern is prefixed with '

902:

is sometimes equated with the IQ motif itself, but a more accurate description would be a

8: 1612: 1608: 1578: 1015:

PROSITE allows the following pattern elements in addition to those described previously:

837: 601: 523: 505: 481: 368: 44: 2403: 2346: 2246: 2162: 1931: 706: 284: 2716: 2675: 2558: 2533: 2480: 2461:. GECCO '05. New York, NY, USA: Association for Computing Machinery. pp. 447–452. 2430: 2387: 2363: 2330: 2265: 2230: 2122: 2097: 2096:

Weirauch MT, Cote A, Norel R, Annala M, Zhao Y, Riley TR, et al. (February 2013).

2073: 2048: 2024: 1999: 1953: 1880: 1875: 918: 904: 565: 477: 361: 832:

occurring in the pattern. Observed probabilities can be graphically represented using

2708: 2667: 2657: 2626: 2592: 2588: 2563: 2510: 2470: 2435: 2417: 2368: 2311: 2270: 2186: 2181: 2146: 2127: 2078: 2029: 2011: 1971: 1966: 1945: 1915: 1731: 1717: 1668: 1651: 553: 2720: 2679: 2306: 2289: 552:

of sequences, researchers search and find motifs using computer-based techniques of

2700: 2687:

Balla S, Thapar V, Verma S, Luong T, Faghri T, Huang CH, et al. (March 2006).

2649: 2618: 2584: 2553: 2545: 2484: 2462: 2425: 2407: 2358: 2350: 2301: 2260: 2250: 2176: 2166: 2117: 2109: 2068: 2060: 2019: 1961: 1935: 1890: 1870: 1753: 1650:

Evolving further, advanced motif discovery embraces sophisticated techniques, with

527: 518:. Some of these are believed to affect the shape of nucleic acids (see for example 1687: 533:

Short coding motifs, which appear to lack secondary structure, include those that

2688: 2653: 2412: 2255: 1865: 1855: 542: 372: 2622: 2459:

Proceedings of the 7th annual conference on Genetic and evolutionary computation

2354: 2151:

Proceedings of the National Academy of Sciences of the United States of America

1920:

Proceedings of the National Academy of Sciences of the United States of America

1885: 1817: 1570: 561: 538: 534: 480:

of the protein. Nevertheless, motifs need not be associated with a distinctive

2147:"The gcm-motif: a novel DNA-binding motif conserved in Drosophila and mammals" 2098:"Evaluation of methods for modeling transcription factor sequence specificity" 2732: 2575:

Stormo GD (January 2000). "DNA binding sites: representation and discovery".

2421: 2171: 2015: 1949: 1860: 1749: 1684:

adaptability of these algorithms in the intricate domain of motif discovery.

1680: 836:. Sometimes patterns are defined in terms of a probabilistic model such as a 833: 515: 485: 469: 342: 2466: 496:

with such motifs need not deviate from the typical shape (e.g. the "B-form"

2712: 2671: 2596: 2567: 2439: 2372: 2315: 2274: 2131: 2082: 2033: 1940: 1655: 1533: 1524:

is another motif discovery method that is based on combinatorial approach.

1488: 519: 493: 465: 2514: 2454: 2190: 1975: 2064: 1548:

in 1996. It spans about 150 amino acid residues, and begins as follows:

1179: 2231:"PhyloGibbs: a Gibbs sampling motif finder that incorporates phylogeny" 358: 354: 1735: 1721: 1059:' can also occur inside a terminating square bracket pattern, so that 2704: 2617:. Lecture Notes in Computer Science. Vol. 5780. pp. 13–23. 2113: 1957: 1835: 2507:

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

717:. Statements consisting only of original research should be removed. 622: 395: 167: 65: 1811: 1215: 867: 549: 511: 2549: 2049:"MEME: discovering and analyzing DNA and protein sequence motifs" 1707: 1001: 473: 364: 1544:, Akiyama and others discovered a pattern which they called the 1480:

issues and the data-intensive computational scalability issues.

1712: 2534:"Viral infection and human disease--insights from minimotifs" 2453:

Che, Dongsheng; Song, Yinglei; Rasheed, Khaled (2005-06-25).

2228: 1998:

Hashim, Fatma A.; Mabrouk, Mai S.; Al-Atabany, Walid (2019).

1520:. evaluated many related algorithms in a 2013 benchmark. The 1005: 2689:"Minimotif Miner: a tool for investigating protein function" 1023:' can be used as a pattern element to denote any amino acid. 1830: 1574: 461: 457: 2229:

Siddharthan R, Siggia ED, van Nimwegen E (December 2005).

522:), but this is only sometimes the case. For example, many 2497: 2385: 1599: 497: 2145:

Akiyama Y, Hosoya T, Poole AM, Hotta Y (December 1996).

367:

that is widespread and usually assumed to be related to

2331:"DNA Motif Recognition Modeling from Protein Sequences" 2144: 2095: 2000:"Review of Different Sequence Motif Finding Algorithms" 1997: 2046: 965:

matches the six amino acid sequences corresponding to

2686: 1553:

WDIND*.*P..*...D.F.*W***.**.IYS**...A.*H*S*WAMRNTNNHN

2612: 2531: 2047:

Bailey TL, Williams N, Misleh C, Li WW (July 2006).

1807: 1611:

approach has been proposed to infer DNA motifs from

1492:

A flowchart depicting the process of motif discovery

824:does not give any indication of the probability of 279:

may be too technical for most readers to understand

90:. Unsourced material may be challenged and removed. 1916:"The Effects of Sequence Context on DNA Curvature" 1585:MEME algorithm, with PhyloGibbs being an example. 2455:"MDGA: Motif discovery using a genetic algorithm" 1562:signifies a single amino acid or a gap, and each 843: 2730: 1913: 1700: 912: 898:. Since the last choice is so wide, the pattern 1993: 1991: 1989: 1987: 1985: 1588: 1527: 537:proteins for delivery to particular parts of a 2452: 2388:"A Bayesian search for transcriptional motifs" 2222: 1914:Dlakić, Mensur; Harrington, Rodney E. (1996). 1569:A similar approach is commonly used by modern 934:matches any of the amino acids represented by 560:. Such techniques belong to the discipline of 1532:Motifs have also been discovered by taking a 2532:Kadaveru K, Vyas J, Schiller MR (May 2008). 2498:Matsuda H, Taniguchi F, Hashimoto A (1997). 1982: 1756:. "W" always corresponds to an alpha helix. 1218:database for the transcription factor AP-1: 2639: 2491: 2322: 2281: 2089: 995: 866:For example, the defining sequence for the 767:-glycosylation site motif mentioned above: 651:. Unsourced material may be challenged and 610:Learn how and when to remove these messages 424:. Unsourced material may be challenged and 53:Learn how and when to remove these messages 2138: 1504: 2557: 2429: 2411: 2362: 2305: 2264: 2254: 2180: 2170: 2121: 2072: 2040: 2023: 2004:Avicenna Journal of Medical Biotechnology 1965: 1939: 1663:Nature-Inspired and Heuristic Algorithms: 751:Learn how and when to remove this message 733:Learn how and when to remove this message 671:Learn how and when to remove this message 476:; that is a stereotypical element of the 444:Learn how and when to remove this message 325:Learn how and when to remove this message 307:Learn how and when to remove this message 291:, without removing the technical details. 252:Learn how and when to remove this message 150:Learn how and when to remove this message 16:Nucleotide or amino-acid sequence pattern 1686: 1487: 336: 1499:Position-specific Weight Matrices (PWM) 571: 2731: 2574: 887:Usually, however, the first letter is 371:of the macromolecule. For example, an 341:A DNA sequence motif represented as a 188:Please improve this article by adding 2615:Pattern Recognition in Bioinformatics 456:When a sequence motif appears in the 289:make it understandable to non-experts 2642:Current Protocols in Protein Science 2605: 2328: 2287: 682: 649:adding citations to reliable sources 616: 575: 422:adding citations to reliable sources 389: 263: 161: 88:adding citations to reliable sources 59: 18: 1114:is equivalent to the repetition of 1100:is equivalent to the repetition of 503:Outside of gene exons, there exist 13: 2524: 1901: 1469: 1161:matches any sequence that matches 1030:denotes any amino acid other than 14: 2750: 1841:Multiple EM for Motif Elicitation 1673:Particle Swarm Optimization (PSO) 1514:Multiple EM for Motif Elicitation 591:This section has multiple issues. 34:This article has multiple issues. 1810: 1758: 1188:C-x(2,4)-C-x(3)--x(8)-H-x(3,5)-H 687: 621: 580: 526:that have affinity for specific 394: 268: 166: 64: 23: 2446: 2379: 1177:The signature of the C2H2-type 777:This pattern may be written as 599:or discuss these issues on the 75:needs additional citations for 42:or discuss these issues on the 2589:10.1093/bioinformatics/16.1.16 2197: 1907: 1695: 1603:motif recognition from protein 1086:are two decimal integers with 844:Motifs and consensus sequences 1: 2307:10.1093/bioinformatics/btx381 2059:(Web Server issue): W369-73. 1896: 1752:between alpha-carbons of the 1739: chain A) both have a 1701:Three-dimensional chain codes 1474: 1214:An example of a PFM from the 913:Pattern description notations 378:site motif can be defined as 190:secondary or tertiary sources 2654:10.1002/0471140864.ps0212s48 2648:(1). Wiley: 2.12.1–2.12.14. 2413:10.1371/journal.pone.0013897 2256:10.1371/journal.pcbi.0010067 1528:Phylogenetic motif discovery 803:means any amino acid except 7: 2623:10.1007/978-3-642-04031-3_2 1803: 1729:catabolite gene activator ( 1677:Artificial Bee Colony (ABC) 1193: 921:and use these conventions: 713:the claims made and adding 385: 345:for the LexA-binding motif. 10: 2755: 2355:10.1016/j.isci.2018.09.003 2329:Wong KC (September 2018). 2235:PLOS Computational Biology 1619:Motif Discovery Algorithms 1078:is a pattern element, and 1851:Protein primary structure 1540:) gene in man, mouse and 2288:Wong KC (October 2017). 2172:10.1073/pnas.93.25.14912 996:PROSITE pattern notation 2538:Frontiers in Bioscience 2467:10.1145/1068009.1068080 1669:Genetic Algorithms (GA) 1637:Probabilistic Approach: 1019:The lower case letter ' 510:and motifs within the " 2053:Nucleic Acids Research 1941:10.1073/pnas.93.9.3847 1831:Mammalian Motif Finder 1826:Biomolecular structure 1692: 1493: 1208:position weight matrix 1122:times for any integer 346: 177:relies excessively on 1846:Nucleic acid sequence 1725: chain A) and 1690: 1628:Enumerative Approach: 1491: 548:Within a sequence or 340: 2102:Nature Biotechnology 1767:Amino acid sequence 1592:motif pair discovery 1522:planted motif search 1484:Process of discovery 870:may be taken to be: 645:improve this section 572:Motif Representation 524:DNA binding proteins 488:" sequences are not 418:improve this section 84:improve this article 2404:2010PLoSO...513897M 2347:2018iSci....7..198W 2247:2005PLSCB...1...67S 2205:"Modelling in Pfam" 2163:1996PNAS...9314912A 1932:1996PNAS...93.3847D 1796:RQEIGQIVGCSRETVGRIL 1791:KWWWWWWGKCFKWWWWWWW 1781:LYDVAEYAGVSYQTVSRVV 1776:TWWWWWWWKCLKWWWWWWG 1613:DNA-binding domains 1609:Markov random field 1538:glial cells missing 919:regular expressions 894:choices resolve to 838:hidden Markov model 541:, or mark them for 506:regulatory sequence 492:into proteins, and 482:secondary structure 369:biological function 2065:10.1093/nar/gkl198 1881:Conserved sequence 1876:Short linear motif 1693: 1681:Cuckoo Search (CS) 1646:Advanced Approach: 1573:databases such as 1494: 1004:notation uses the 905:consensus sequence 698:possibly contains 566:consensus sequence 347: 2632:978-3-642-04030-6 2476:978-1-59593-010-1 2300:(19): 3028–3035. 1801: 1800: 1652:Bayesian modeling 1463: 1462: 1151:is equivalent to 961:Thus the pattern 761: 760: 753: 743: 742: 735: 700:original research 681: 680: 673: 614: 554:sequence analysis 528:DNA binding sites 520:RNA self-splicing 478:overall structure 454: 453: 446: 335: 334: 327: 317: 316: 309: 262: 261: 254: 236: 160: 159: 152: 134: 57: 2746: 2724: 2705:10.1038/nmeth856 2683: 2636: 2600: 2571: 2561: 2519: 2518: 2504: 2495: 2489: 2488: 2450: 2444: 2443: 2433: 2415: 2383: 2377: 2376: 2366: 2326: 2320: 2319: 2309: 2285: 2279: 2278: 2268: 2258: 2226: 2220: 2219: 2217: 2215: 2201: 2195: 2194: 2184: 2174: 2142: 2136: 2135: 2125: 2114:10.1038/nbt.2486 2093: 2087: 2086: 2076: 2044: 2038: 2037: 2027: 1995: 1980: 1979: 1969: 1943: 1926:(9): 3847–3852. 1911: 1891:Structural motif 1871:Structural motif 1820: 1815: 1814: 1797: 1792: 1782: 1777: 1759: 1754:protein backbone 1741:helix-turn-helix 1738: 1724: 1715:repressor LacI ( 1679:algorithms, and 1565: 1561: 1554: 1221: 1220: 1189: 1172: 1168: 1164: 1160: 1154: 1150: 1137: 1133: 1129: 1125: 1121: 1117: 1113: 1107: 1103: 1099: 1093: 1089: 1085: 1081: 1077: 1070: 1066: 1062: 1058: 1051: 1044: 1037: 1033: 1029: 1022: 1011: 988: 984: 980: 976: 972: 968: 964: 945: 941: 937: 933: 907:for the IQ motif 901: 897: 893: 890: 883: 876: 862: 858: 854: 850: 831: 827: 823: 817: 813: 809: 806: 802: 798: 794: 790: 786: 781: 756: 749: 738: 731: 727: 724: 718: 715:inline citations 691: 690: 683: 676: 669: 665: 662: 656: 625: 617: 606: 584: 583: 576: 498:DNA double helix 470:structural motif 449: 442: 438: 435: 429: 398: 390: 330: 323: 312: 305: 301: 298: 292: 272: 271: 264: 257: 250: 246: 243: 237: 235: 201:"Sequence motif" 194: 170: 162: 155: 148: 144: 141: 135: 133: 99:"Sequence motif" 92: 68: 60: 49: 27: 26: 19: 2754: 2753: 2749: 2748: 2747: 2745: 2744: 2743: 2729: 2728: 2727: 2664: 2633: 2608: 2606:Primary sources 2603: 2544:(13): 6455–71. 2527: 2525:Further reading 2522: 2502: 2496: 2492: 2477: 2451: 2447: 2384: 2380: 2327: 2323: 2286: 2282: 2227: 2223: 2213: 2211: 2203: 2202: 2198: 2157:(25): 14912–6. 2143: 2139: 2094: 2090: 2045: 2041: 1996: 1983: 1912: 1908: 1904: 1902:Primary sources 1899: 1866:Sequence mining 1856:Protein I-sites 1816: 1809: 1806: 1795: 1790: 1780: 1775: 1730: 1716: 1703: 1698: 1605: 1594: 1563: 1559: 1552: 1542:D. melanogaster 1530: 1510: 1508:motif discovery 1477: 1472: 1470:Motif Discovery 1196: 1187: 1170: 1166: 1162: 1158: 1152: 1148: 1144:Some examples: 1135: 1131: 1127: 1123: 1119: 1115: 1111: 1105: 1101: 1097: 1091: 1087: 1083: 1079: 1075: 1068: 1064: 1060: 1056: 1055:The character ' 1049: 1042: 1035: 1031: 1027: 1020: 1009: 998: 986: 982: 978: 974: 970: 966: 962: 943: 939: 935: 932: 915: 899: 895: 892: 888: 881: 874: 860: 856: 852: 849: 846: 829: 825: 822: 815: 811: 808: 804: 800: 796: 792: 788: 784: 779: 757: 746: 745: 744: 739: 728: 722: 719: 704: 692: 688: 677: 666: 660: 657: 642: 626: 585: 581: 574: 543:phosphorylation 450: 439: 433: 430: 415: 399: 388: 331: 320: 319: 318: 313: 302: 296: 293: 285:help improve it 282: 273: 269: 258: 247: 241: 238: 195: 193: 187: 183:primary sources 171: 156: 145: 139: 136: 93: 91: 81: 69: 28: 24: 17: 12: 11: 5: 2752: 2742: 2741: 2739:Bioinformatics 2726: 2725: 2693:Nature Methods 2684: 2663:978-0471140863 2662: 2637: 2631: 2609: 2607: 2604: 2602: 2601: 2577:Bioinformatics 2572: 2528: 2526: 2523: 2521: 2520: 2490: 2475: 2445: 2398:(11): e13897. 2378: 2321: 2294:Bioinformatics 2280: 2221: 2196: 2137: 2088: 2039: 2010:(2): 130–148. 1981: 1905: 1903: 1900: 1898: 1895: 1894: 1893: 1888: 1886:Protein domain 1883: 1878: 1873: 1868: 1863: 1858: 1853: 1848: 1843: 1838: 1833: 1828: 1822: 1821: 1818:Biology portal 1805: 1802: 1799: 1798: 1793: 1788: 1784: 1783: 1778: 1773: 1769: 1768: 1765: 1762: 1750:torsion angles 1702: 1699: 1697: 1694: 1604: 1598: 1593: 1587: 1571:protein domain 1556: 1555: 1529: 1526: 1509: 1503: 1476: 1473: 1471: 1468: 1461: 1460: 1457: 1454: 1451: 1448: 1445: 1441: 1440: 1437: 1434: 1431: 1428: 1425: 1421: 1420: 1417: 1414: 1411: 1408: 1405: 1401: 1400: 1397: 1394: 1391: 1388: 1385: 1381: 1380: 1377: 1374: 1371: 1368: 1365: 1361: 1360: 1357: 1354: 1351: 1348: 1345: 1341: 1340: 1337: 1334: 1331: 1328: 1325: 1321: 1320: 1317: 1314: 1311: 1308: 1305: 1301: 1300: 1297: 1294: 1291: 1288: 1285: 1281: 1280: 1277: 1274: 1271: 1268: 1265: 1261: 1260: 1257: 1254: 1251: 1248: 1245: 1241: 1240: 1237: 1234: 1231: 1228: 1225: 1212: 1211: 1204: 1195: 1192: 1191: 1190: 1175: 1174: 1156: 1142: 1141: 1140: 1139: 1109: 1072: 1063:matches both " 1053: 1046: 1039: 1024: 997: 994: 959: 958: 948: 947: 929: 926: 914: 911: 878: 877: 845: 842: 834:sequence logos 775: 774: 759: 758: 741: 740: 695: 693: 686: 679: 678: 629: 627: 620: 615: 589: 588: 586: 579: 573: 570: 562:bioinformatics 452: 451: 402: 400: 393: 387: 384: 376:-glycosylation 351:sequence motif 349:In biology, a 333: 332: 315: 314: 276: 274: 267: 260: 259: 174: 172: 165: 158: 157: 72: 70: 63: 58: 32: 31: 29: 22: 15: 9: 6: 4: 3: 2: 2751: 2740: 2737: 2736: 2734: 2722: 2718: 2714: 2710: 2706: 2702: 2698: 2694: 2690: 2685: 2681: 2677: 2673: 2669: 2665: 2659: 2655: 2651: 2647: 2643: 2638: 2634: 2628: 2624: 2620: 2616: 2611: 2610: 2598: 2594: 2590: 2586: 2582: 2578: 2573: 2569: 2565: 2560: 2555: 2551: 2547: 2543: 2539: 2535: 2530: 2529: 2516: 2512: 2508: 2501: 2494: 2486: 2482: 2478: 2472: 2468: 2464: 2460: 2456: 2449: 2441: 2437: 2432: 2427: 2423: 2419: 2414: 2409: 2405: 2401: 2397: 2393: 2389: 2382: 2374: 2370: 2365: 2360: 2356: 2352: 2348: 2344: 2340: 2336: 2332: 2325: 2317: 2313: 2308: 2303: 2299: 2295: 2291: 2284: 2276: 2272: 2267: 2262: 2257: 2252: 2248: 2244: 2240: 2236: 2232: 2225: 2210: 2206: 2200: 2192: 2188: 2183: 2178: 2173: 2168: 2164: 2160: 2156: 2152: 2148: 2141: 2133: 2129: 2124: 2119: 2115: 2111: 2108:(2): 126–34. 2107: 2103: 2099: 2092: 2084: 2080: 2075: 2070: 2066: 2062: 2058: 2054: 2050: 2043: 2035: 2031: 2026: 2021: 2017: 2013: 2009: 2005: 2001: 1994: 1992: 1990: 1988: 1986: 1977: 1973: 1968: 1963: 1959: 1955: 1951: 1947: 1942: 1937: 1933: 1929: 1925: 1921: 1917: 1910: 1906: 1892: 1889: 1887: 1884: 1882: 1879: 1877: 1874: 1872: 1869: 1867: 1864: 1862: 1861:Sequence logo 1859: 1857: 1854: 1852: 1849: 1847: 1844: 1842: 1839: 1837: 1834: 1832: 1829: 1827: 1824: 1823: 1819: 1813: 1808: 1794: 1789: 1786: 1785: 1779: 1774: 1771: 1770: 1766: 1764:3D chain code 1763: 1761: 1760: 1757: 1755: 1751: 1746: 1742: 1737: 1733: 1728: 1723: 1719: 1714: 1710: 1709: 1689: 1685: 1682: 1678: 1674: 1670: 1665: 1664: 1660: 1657: 1656:Markov models 1653: 1648: 1647: 1643: 1639: 1638: 1634: 1630: 1629: 1625: 1621: 1620: 1616: 1615:of proteins. 1614: 1610: 1602: 1597: 1591: 1586: 1584: 1580: 1576: 1572: 1567: 1551: 1550: 1549: 1547: 1543: 1539: 1535: 1525: 1523: 1519: 1515: 1507: 1502: 1500: 1490: 1486: 1485: 1481: 1467: 1458: 1455: 1452: 1449: 1446: 1443: 1442: 1438: 1435: 1432: 1429: 1426: 1423: 1422: 1418: 1415: 1412: 1409: 1406: 1403: 1402: 1398: 1395: 1392: 1389: 1386: 1383: 1382: 1378: 1375: 1372: 1369: 1366: 1363: 1362: 1358: 1355: 1352: 1349: 1346: 1343: 1342: 1338: 1335: 1332: 1329: 1326: 1323: 1322: 1318: 1315: 1312: 1309: 1306: 1303: 1302: 1298: 1295: 1292: 1289: 1286: 1283: 1282: 1278: 1275: 1272: 1269: 1266: 1263: 1262: 1258: 1255: 1252: 1249: 1246: 1243: 1242: 1238: 1235: 1232: 1229: 1226: 1223: 1222: 1219: 1217: 1209: 1205: 1201: 1200: 1199: 1186: 1185: 1184: 1182: 1181: 1157: 1147: 1146: 1145: 1110: 1096: 1095: 1073: 1054: 1047: 1040: 1025: 1018: 1017: 1016: 1013: 1007: 1003: 993: 990: 957: 953: 952: 951: 930: 927: 924: 923: 922: 920: 910: 908: 906: 885: 873: 872: 871: 869: 864: 848:The notation 841: 839: 835: 821:The notation 819: 810:means either 782: 773: 770: 769: 768: 766: 763:Consider the 755: 752: 737: 734: 726: 716: 712: 708: 702: 701: 696:This article 694: 685: 684: 675: 672: 664: 654: 650: 646: 640: 639: 635: 630:This section 628: 624: 619: 618: 613: 611: 604: 603: 598: 597: 592: 587: 578: 577: 569: 567: 563: 559: 555: 551: 546: 544: 540: 536: 531: 529: 525: 521: 517: 516:satellite DNA 513: 509: 507: 501: 499: 495: 494:nucleic acids 491: 487: 483: 479: 475: 471: 467: 463: 459: 448: 445: 437: 427: 423: 419: 413: 412: 408: 403:This section 401: 397: 392: 391: 383: 381: 377: 375: 370: 366: 363: 360: 356: 352: 344: 343:sequence logo 339: 329: 326: 311: 308: 300: 290: 286: 280: 277:This article 275: 266: 265: 256: 253: 245: 234: 231: 227: 224: 220: 217: 213: 210: 206: 203: – 202: 198: 197:Find sources: 191: 185: 184: 180: 175:This article 173: 169: 164: 163: 154: 151: 143: 132: 129: 125: 122: 118: 115: 111: 108: 104: 101: – 100: 96: 95:Find sources: 89: 85: 79: 78: 73:This article 71: 67: 62: 61: 56: 54: 47: 46: 41: 40: 35: 30: 21: 20: 2699:(3): 175–7. 2696: 2692: 2645: 2641: 2614: 2583:(1): 16–23. 2580: 2576: 2550:10.2741/3166 2541: 2537: 2506: 2493: 2458: 2448: 2395: 2391: 2381: 2338: 2334: 2324: 2297: 2293: 2283: 2238: 2234: 2224: 2212:. Retrieved 2208: 2199: 2154: 2150: 2140: 2105: 2101: 2091: 2056: 2052: 2042: 2007: 2003: 1923: 1919: 1909: 1744: 1740: 1726: 1706: 1704: 1666: 1662: 1661: 1649: 1645: 1644: 1640: 1636: 1635: 1631: 1627: 1626: 1622: 1618: 1617: 1606: 1600: 1595: 1589: 1582: 1568: 1557: 1541: 1537: 1534:phylogenetic 1531: 1517: 1511: 1505: 1495: 1483: 1482: 1478: 1464: 1213: 1197: 1178: 1176: 1143: 1126:satisfying: 1014: 999: 991: 960: 954: 949: 916: 903: 886: 879: 865: 847: 820: 778: 776: 771: 764: 762: 747: 729: 720: 697: 667: 658: 643:Please help 631: 607: 600: 594: 593:Please help 590: 547: 532: 504: 502: 455: 440: 431: 416:Please help 404: 379: 373: 350: 348: 321: 303: 294: 278: 248: 239: 229: 222: 215: 208: 196: 176: 146: 137: 127: 120: 113: 106: 94: 82:Please help 77:verification 74: 50: 43: 37: 36:Please help 33: 2341:: 198–211. 2214:14 December 1696:Motif Cases 1607:In 2018, a 1183:domain is: 1180:zinc finger 900:IQxxxRGxxxR 891:, and both 564:. See also 514:", such as 297:August 2020 2509:: 280–91. 2241:(7): e67. 1897:References 1558:Here each 875:QxxxGxxxxx 723:March 2020 707:improve it 661:March 2020 596:improve it 556:, such as 490:translated 434:March 2020 359:amino-acid 355:nucleotide 242:March 2020 212:newspapers 179:references 140:March 2020 110:newspapers 39:improve it 2422:1932-6203 2016:2008-2835 1950:0027-8424 1836:MochiView 1546:GCM motif 711:verifying 632:does not 602:talk page 486:Noncoding 464:, it may 405:does not 45:talk page 2733:Category 2721:15571142 2713:16489333 2680:10406520 2672:18429315 2597:10812473 2568:18508672 2440:21124986 2392:PLOS ONE 2373:30267681 2335:iScience 2316:28633280 2275:16477324 2132:23354101 2083:16845028 2034:31057715 1804:See also 1711:lactose 1475:Overview 1216:TRANSFAC 1194:Matrices 1118:exactly 1104:exactly 1094:, then: 868:IQ motif 550:database 386:Overview 362:sequence 2559:2628544 2515:9390299 2485:7892935 2431:2987817 2400:Bibcode 2364:6153143 2343:Bibcode 2266:1309704 2243:Bibcode 2191:8962155 2159:Bibcode 2123:3687085 2074:1538909 2025:6490410 1976:8632978 1928:Bibcode 1727:E. coli 1708:E. coli 1601:De novo 1590:De novo 1583:de novo 1506:De novo 1203:models. 1171:x-x-x-x 1067:" and " 1002:PROSITE 799:= Thr; 795:= Ser, 791:= Pro, 787:= Asn, 780:N{P}{P} 705:Please 653:removed 638:sources 474:protein 472:" of a 426:removed 411:sources 365:pattern 283:Please 226:scholar 124:scholar 2719: 2711: 2678: 2670: 2660: 2629: 2595: 2566: 2556: 2513: 2483: 2473: 2438: 2428: 2420: 2371: 2361: 2314: 2273: 2263: 2189: 2179: 2130: 2120: 2081: 2071: 2032: 2022: 2014: 1974: 1964: 1956: 1948: 1787:3gapA 1772:1lccA 1745:et al. 1713:operon 1239:IUPAC 1159:x(2,4) 1134:<= 1130:<= 1112:e(m,n) 1108:times; 1090:<= 985:, and 880:where 851:means 807:; and 783:where 508:motifs 466:encode 228: 221: 214: 207: 199: 126: 119: 112: 105: 97: 2717:S2CID 2676:S2CID 2503:(PDF) 2481:S2CID 2182:26236 1967:39447 1958:39155 1954:JSTOR 1518:et al 1167:x-x-x 1153:x-x-x 1069:S> 1006:IUPAC 956:turn. 558:BLAST 535:label 468:the " 460:of a 353:is a 233:JSTOR 219:books 131:JSTOR 117:books 2709:PMID 2668:PMID 2658:ISBN 2627:ISBN 2593:PMID 2564:PMID 2511:PMID 2471:ISBN 2436:PMID 2418:ISSN 2369:PMID 2312:PMID 2271:PMID 2216:2023 2209:Pfam 2187:PMID 2128:PMID 2079:PMID 2030:PMID 2012:ISSN 1972:PMID 1946:ISSN 1736:3gap 1722:1lcc 1705:The 1579:HMMs 1575:Pfam 1149:x(3) 1098:e(m) 1082:and 1057:> 1050:> 1043:< 1028:{ST} 1000:The 636:any 634:cite 539:cell 512:junk 462:gene 458:exon 409:any 407:cite 205:news 103:news 2701:doi 2650:doi 2619:doi 2585:doi 2554:PMC 2546:doi 2463:doi 2426:PMC 2408:doi 2359:PMC 2351:doi 2302:doi 2261:PMC 2251:doi 2177:PMC 2167:doi 2118:PMC 2110:doi 2069:PMC 2061:doi 2020:PMC 1962:PMC 1936:doi 1732:PDB 1718:PDB 1224:Pos 1169:or 1165:or 1163:x-x 1074:If 1034:or 987:BEF 983:BDF 979:BCF 975:AEF 971:ADF 967:ACF 942:or 938:or 859:or 855:or 828:or 814:or 801:{X} 709:by 647:by 500:). 484:. " 420:by 357:or 287:to 181:to 86:by 2735:: 2715:. 2707:. 2695:. 2691:. 2674:. 2666:. 2656:. 2646:48 2644:. 2625:. 2591:. 2581:16 2579:. 2562:. 2552:. 2542:13 2540:. 2536:. 2505:. 2479:. 2469:. 2457:. 2434:. 2424:. 2416:. 2406:. 2394:. 2390:. 2367:. 2357:. 2349:. 2337:. 2333:. 2310:. 2298:33 2296:. 2292:. 2269:. 2259:. 2249:. 2237:. 2233:. 2207:. 2185:. 2175:. 2165:. 2155:93 2153:. 2149:. 2126:. 2116:. 2106:31 2104:. 2100:. 2077:. 2067:. 2057:34 2055:. 2051:. 2028:. 2018:. 2008:11 2006:. 2002:. 1984:^ 1970:. 1960:. 1952:. 1944:. 1934:. 1924:93 1922:. 1918:. 1734:: 1720:: 1675:, 1459:W 1444:11 1439:N 1424:10 1419:M 1404:09 1399:N 1384:08 1379:T 1364:07 1359:C 1350:16 1344:06 1339:A 1327:17 1324:05 1319:G 1313:17 1304:04 1299:T 1296:17 1284:03 1279:S 1264:02 1259:R 1244:01 1206:A 1071:". 1065:ST 1052:'. 1045:'. 989:. 981:, 977:, 973:, 969:, 909:. 840:. 818:. 605:. 568:. 545:. 382:. 192:. 48:. 2723:. 2703:: 2697:3 2682:. 2652:: 2635:. 2621:: 2599:. 2587:: 2570:. 2548:: 2517:. 2487:. 2465:: 2442:. 2410:: 2402:: 2396:5 2375:. 2353:: 2345:: 2339:7 2318:. 2304:: 2277:. 2253:: 2245:: 2239:1 2218:. 2193:. 2169:: 2161:: 2134:. 2112:: 2085:. 2063:: 2036:. 1978:. 1938:: 1930:: 1564:* 1560:. 1456:7 1453:1 1450:3 1447:6 1436:3 1433:7 1430:3 1427:4 1416:1 1413:1 1410:6 1407:9 1396:4 1393:2 1390:7 1387:4 1376:9 1373:3 1370:2 1367:3 1356:1 1353:0 1347:0 1336:0 1333:0 1330:0 1316:0 1310:0 1307:0 1293:0 1290:0 1287:0 1276:0 1273:9 1270:5 1267:3 1256:1 1253:8 1250:2 1247:6 1236:T 1233:G 1230:C 1227:A 1173:. 1155:. 1138:. 1136:n 1132:k 1128:m 1124:k 1120:k 1116:e 1106:m 1102:e 1092:n 1088:m 1084:n 1080:m 1076:e 1061:S 1038:. 1036:T 1032:S 1021:x 1010:- 963:F 946:. 944:c 940:b 936:a 896:R 889:I 882:x 861:Z 857:Y 853:X 830:Y 826:X 816:Y 812:X 805:X 797:T 793:S 789:P 785:N 765:N 754:) 748:( 736:) 730:( 725:) 721:( 703:. 674:) 668:( 663:) 659:( 655:. 641:. 612:) 608:( 447:) 441:( 436:) 432:( 428:. 414:. 374:N 328:) 322:( 310:) 304:( 299:) 295:( 281:. 255:) 249:( 244:) 240:( 230:· 223:· 216:· 209:· 186:. 153:) 147:( 142:) 138:( 128:· 121:· 114:· 107:· 80:. 55:) 51:(

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

Index