Knowledge

FASTA format

Source đź“ť

243:;LCBO - Prolactin precursor - Bovine ; a sample sequence in FASTA format MDSKGSSQKGSRLLLLLVVSNLLLCQGVVSTPVCPNGPGNCQVSLRDLFDRAVMVSHYIHDLSS EMFNEFDKRYAQGKGFITMALNSCHTSSLPTPEDKEQAQQTHHEVLMSLILGLLRSWNDPLYHL VTEVRGMKGAPDAILSRAIEIEEENKRLLEGMEMIFGQVIPGAKETEPYPVWSGLPSLQTKDED ARYSAFYNLLHCLRRDSSKIDTYLKLLNCRIIYNNNC* >MCHU - Calmodulin - Human, rabbit, bovine, rat, and chicken MADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADGNGTID FPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYISAAELRHVMTNLGEKLTDEEVDEMIREA DIDGDGQVNYEEFVQMMTAK* >gi|5524211|gb|AAD44166.1| cytochrome b LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV EWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLG LLILILLLLLLALLSPDMLGDPDNHMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALFLSIVIL GLMPFLHTSKHRSMMLRPLSQALFWTLTMDLLTLTWIGSQPVEYPYTIIGQMASILYFSIILAFLPIAGX IENY 1693: 3243: 3255: 274:>SEQUENCE_1 MTEITAAMVKELRESTGAGMMDCKNALSETNGDFDKAVQLLREKGLGKAAKKADRLAAEG LVSVKVSDDFTIAAMRPSYLSYEDLDMTFVENEYKALVAELEKENEERRRLKDPNKPEHK IPQFASRKQLSDAILKEAEEKIKEELKAQGKPEKIWDNIIPGKMNSFIADNSQLDSKLTL MGQFYVMDDKKTVEQVIAEKEKEFGGKIKIVEFICFEVGEGLEKKTEDFAAEVAAQL >SEQUENCE_2 SATVSEINSETDFVAKNDQFIALTKDTTAHIQSNSLQSVEELHSSTINGVKFEEYLKSQI ATIGENLVVRRFATLKAGANGVVNGYIHTNGRVGVVIAAACDSAEVASKSRDLLRQICMH 231:(or compatible) terminals which could display 80 or 132 characters per line. Most people preferred the bigger font in 80-character modes and so it became the recommended fashion to use 80 characters or less (often 70) in FASTA lines. Also, the width of a standard printed page is 70 to 80 characters (depending on the font). Hence, 80 characters became the norm. 1545:
performs lossless compression of these files using context modelling and arithmetic encoding. Genozip, a software package for compressing genomic files, uses an extensible context-based model. Benchmarks of FASTA file compression algorithms have been reported by Hosseini et al. in 2016, and Kryukov et al. in 2020.
234:
The first line in a FASTA file started either with a ">" (greater-than) symbol or, less frequently, a ";" (semicolon) was taken as a comment. Subsequent lines starting with a semicolon would be ignored by software. Since the only comment used was the first, it quickly became used to hold a summary
194:
A sequence begins with a greater-than character (">") followed by a description of the sequence (all in a single line). The lines immediately following the description line are the sequence representation, with one letter per amino acid or nucleic acid, and are typically no more than 80 characters
1592:
A plethora of user-friendly scripts are available from the community to perform FASTA file manipulations. Online toolboxes, such as FaBox or the FASTX-Toolkit within Galaxy servers, are also available. These can be used to segregate sequence headers/identifiers, rename them, shorten them, or extract
260:
The description line (defline) or header/identifier line, which begins with ">", gives a name and/or a unique identifier for the sequence, and may also contain additional information. In a deprecated practice, the header line sometimes contained more than one header, separated by a ^A (Control-A)
251:
Modern bioinformatics programs that rely on the FASTA format expect the sequence headers to be preceded by ">". The sequence is generally represented as "interleaved", or on multiple lines as in the above example, but may also be "sequential", or on a single line. Running different bioinformatics
238:
Following the initial line (used for a unique description of the sequence) was the actual sequence itself in the standard one-letter character string. Anything other than a valid character would be ignored (including spaces, tabulators, asterisks, etc...). It was also common to end the sequence with
1593:
sequences of interest from large FASTA files based on a list of wanted identifiers (among other available functions). A tree-based approach to sorting multi-FASTA files (TREE2FASTA) also exists based on the coloring and/or annotation of sequences of interest in the FigTree viewer. Additionally, the
835:
codes, with these exceptions: lower-case letters are accepted and are mapped into upper-case; a single hyphen or dash can be used to represent a gap character; and in amino acid sequences, U and * are acceptable letters (see below). Numerical digits are not allowed but are used in some databases to
247:
A multiple-sequence FASTA format, or multi-FASTA format, would be obtained by concatenating several single-sequence FASTA files in one file. This does not imply a contradiction with the format as only the first line in a FASTA file may start with a ";" or ">", forcing all subsequent sequences to
1544:
The compression of FASTA files requires a specific compressor to handle both channels of information: identifiers and sequence. For improved compression results, these are mainly divided into two streams where the compression is made assuming independence. For example, the algorithm MFCompress
287:
defined a standard for the unique identifier used for the sequence (SeqID) in the header line. This allows a sequence that was obtained from a database to be labelled with a reference to its database record. The database identifier format is understood by the NCBI tools like
222:
In the original format, a sequence was represented as a series of lines, each of which was no longer than 120 characters and usually did not exceed 80 characters. This probably was to allow for the preallocation of fixed line sizes in software: at the time most users relied on
3098: 1553:
The encryption of FASTA files can be performed with various tools, including Cryfa and Genozip. Cryfa uses AES encryption and also enables data compression. Similarly, Genozip can encrypt FASTA files with AES-256 during compression.
235:
description of the sequence, often starting with a unique library accession number, and with time it has become commonplace to always use ">" for the first line and to not use ";" comments (which would otherwise be ignored).
248:
start with a ">" in order to be taken as separate sequences (and further forcing the exclusive reservation of ">" for the sequence definition line). Thus, the examples above would be a multi-FASTA file if taken together.
202:>MCHU - Calmodulin - Human, rabbit, bovine, rat, and chicken MADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADGNGTID FPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYISAAELRHVMTNLGEKLTDEEVDEMIREA DIDGDGQVNYEEFVQMMTAK* 1607:
Several online format converters exist to rapidly reformat multi-FASTA files to different formats (e.g. NEXUS, PHYLIP) for use with different phylogenetic programs, such as the converter available on phylogeny.fr.
265:
FASTA format, one or more comments, distinguished by a semi-colon at the beginning of the line, may occur after the header. Some databases and bioinformatics applications do not recognize these comments and follow
239:
an "*" (asterisk) character (in analogy with use in PIR formatted sequences) and, for the same reason, to leave a blank line between the description and the sequence. Below are a few sample sequences:
3043: 3007: 2946: 2841: 3025: 219:
suite of programs. It can be downloaded with any free distribution of FASTA (see fasta20.doc, fastaVN.doc, or fastaVN.me—where VN is the Version Number).
1909: 1871: 3037: 3104: 2575:
Dereeper A, Guignon V, Blanc G, Audic S, Buffet S, Chevenet F, Dufayard JF, Guindon S, Lefort V, Lescot M, Claverie JM, Gascuel O (July 2008).
1580:") character. The dots can be discarded for compactness without loss of information. As with typical FASTA files used in alignments, the gap (" 1982: 3051: 2952: 1989: 284: 3080: 1584:") is taken to mean exactly one position. A3M is similar to A2M, with the added rule that gaps aligned to insertions can too be discarded. 3115: 1576:. In A2M/A3M sequences, lowercase characters are taken to mean insertions, which are then indicated in the other sequences as the dot (" 3092: 2224:"Sequence Compression Benchmark (SCB) database—A comprehensive evaluation of reference-free compressors for FASTA-formatted sequences" 3086: 2673: 2014: 3074: 2851: 3031: 2929: 2288: 2787: 3286: 3057: 2361: 1452:
for a text file containing FASTA formatted sequences. The table below shows each extension and its respective meaning.
3109: 3013: 2980: 2963: 2923: 1958: 2911: 2899: 3259: 2741: 2495:"TREE2FASTA: a flexible Perl script for batch extraction of FASTA sequences from exploratory phylogenetic trees" 2442:
Blankenberg D, Von Kuster G, Bouvier E, Baker D, Afgan E, Stoler N, Galaxy Team, Taylor J, Nekrutenko A (2014).
2958: 182:
The simplicity of FASTA format makes it easy to manipulate and parse sequences using text-processing tools and
2878: 2795: 2763: 2273:
11th International Conference on Practical Applications of Computational Biology & Bioinformatics (PACBB)
224: 3208: 2727: 2666: 1825: 2836: 2759: 2701: 2385: 262: 212: 68: 1770:"The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants" 448: 2816: 2404: 1933: 1565:
is a form of FASTA format extended to indicate information related to sequencing. It is created by the
807:
but are part of the format. Multiple identifiers can be concatenated, also separated by vertical bars.
2545: 681: 2986: 2940: 1993: 1692: 3099:
International Conference on Computational Intelligence Methods for Bioinformatics and Biostatistics
2969: 2305: 1601: 171:
The format allows for sequence names and comments to precede the sequences. It originated from the
3281: 3247: 3213: 2856: 2709: 2659: 51: 3177: 2917: 2705: 3198: 2811: 1632:
formats, used to represent genome sequencer reads that have been aligned to genome sequences.
2275:. Advances in Intelligent Systems and Computing. Vol. 616. Springer. pp. 305–312. 1847: 3182: 1719: 1665: 804: 2271:
Pratas D, Hosseini M, Pinho A (2017). "Cryfa: a tool to compact and encrypt FASTA files".
2032: 8: 3172: 2991: 2779: 2641: 161: 2719: 2018: 1723: 1669: 3228: 3132: 2935: 2601: 2576: 2521: 2494: 2470: 2443: 2338: 2304:
Hosseini, Morteza; Pratas, Diogo; Pinho, Armando J (2019-01-01). Berger, Bonnie (ed.).
2248: 2223: 2158: 2098: 2073: 2050: 1794: 1769: 1656:
Lipman DJ, Pearson WR (March 1985). "Rapid and sensitive protein similarity searches".
1629: 1573: 1449: 824: 296:. The following list describes the NCBI FASTA defined format for sequence identifiers. 252:
programs may require conversions between "sequential" and "interleaved" FASTA formats.
183: 25: 2906:
Microsoft Research - University of Trento Centre for Computational and Systems Biology
1742: 1707: 3223: 3162: 2737: 2606: 2526: 2475: 2424: 2420: 2343: 2325: 2284: 2253: 2204: 2163: 2145: 2141: 2103: 1865: 1799: 1747: 1681: 1625: 816: 2723: 2321: 2089: 3203: 3167: 2975: 2596: 2588: 2557: 2516: 2506: 2465: 2455: 2416: 2333: 2317: 2276: 2243: 2235: 2194: 2153: 2137: 2093: 2085: 1789: 1781: 1737: 1727: 1673: 2239: 706: 400: 2894: 2280: 2125: 803:
The vertical bars ("|") in the above list are not separators in the sense of the
267: 121: 64: 2561: 815:
Following the header line, the actual sequence is represented. Sequences may be
2682: 1712:
Proceedings of the National Academy of Sciences of the United States of America
1528: 1360: 548: 176: 145: 86: 2511: 3275: 3157: 2753: 2636: 2428: 2371: 2329: 2208: 2149: 1566: 1401: 1200: 1190: 1180: 1155: 836:
indicate the position in the sequence. The nucleic acid codes supported are:
1677: 3147: 3142: 2631: 2610: 2530: 2479: 2347: 2257: 2167: 2107: 1803: 1732: 1618: 1594: 1562: 832: 820: 149: 110: 1785: 1751: 1685: 2592: 2124:
Lan, Divon; Tobler, Ray; Souilmi, Yassine; Llamas, Bastien (2021-02-15).
1885: 1299: 1279: 1128:
The amino acid codes supported (22 amino acids and 3 special codes) are:
157: 2199: 2182: 3218: 3152: 2771: 2733: 1381: 1289: 1275: 1244: 1230: 1159: 973: 828: 165: 37: 827:). Sequences are expected to be represented in the standard IUB/IUPAC 1405: 1350: 1320: 1220: 1635:
The GVF format (Genome Variation Format), an extension based on the
2846: 2783: 2775: 2745: 2577:"Phylogeny.fr: robust phylogenetic analysis for the non-specialist" 2460: 2366: 1391: 1330: 1170: 875: 175:
software package and has since become a near-universal standard in
2646: 1621:, used to represent DNA sequencer reads along with quality scores. 823:
sequences, and they can contain gaps or alignment characters (see
2821: 2767: 2715: 2697: 2651: 2441: 1310: 1265: 1240: 1210: 1145: 939: 907: 891: 859: 472: 1817: 731: 424: 3019: 2873: 2826: 2791: 2492: 2183:"A Survey on Data Compression Methods for Biological Sequences" 2074:"MFCompress: a compression tool for FASTA and multi-FASTA data" 1371: 1340: 1255: 991: 956: 923: 2626: 1533:
Contains non-coding RNA regions for a genome, e.g. tRNA, rRNA
2905: 2831: 2493:
Sauvage T, Plouviez S, Schmidt WE, Fredericq S (March 2018).
1008: 228: 216: 172: 100: 96: 3008:
African Society for Bioinformatics and Computational Biology
2181:
Hosseini, Morteza; Pratas, Diogo; Pinho, Armando J. (2016).
1768:
Cock PJ, Fields CJ, Goto N, Heuer ML, Rice PM (April 2010).
756: 633: 2947:
Max Planck Institute of Molecular Cell Biology and Genetics
2749: 2543: 2444:"Dissemination of scientific software with Galaxy ToolShed" 2221: 1910:"Why is 80 characters the 'standard' limit for code width?" 1636: 164:
or amino acid (protein) sequences, in which nucleotides or
1600:
package can be used to read and manipulate FASTA files in
2574: 2126:"Genozip: a universal extensible genomic data compressor" 1848:"(Now Go Bang!) Raster CRT Typography (According to DEC)" 3026:
International Nucleotide Sequence Database Collaboration
2548:
Biostrings: Efficient manipulation of biological strings
2222:
Kryukov K, Ueda MT, Nakagawa S, Imanishi T (July 2020).
270:. An example of a multiple sequence FASTA file follows: 1572:
A2M/A3M are a family of FASTA-derived formats used for
657: 2544:
Pagès, H; Aboyoun, P; Gentleman, R; DebRoy, S (2018).
2123: 574:(a reference to a database that's not in this list) 2303: 2270: 2180: 1767: 1708:"Improved tools for biological sequence comparison" 2718:, database of protein sequences grouping together 2306:"Cryfa: a secure encryption tool for genomic data" 215:format is described in the documentation for the 3273: 2953:US National Center for Biotechnology Information 31:.fasta, .fas, .fa, .fna, .ffn, .faa, .mpfa, .frn 3038:International Society for Computational Biology 1963:. National Center for Biotechnology Information 3105:ISCB Africa ASBCB Conference on Bioinformatics 2405:"FaBox: an online toolbox for fasta sequences" 1846:Landsteiner, mass:werk, Norbert (2019-02-20). 3052:Institute of Genomics and Integrative Biology 2667: 2037:MAFFT - a multiple sequence alignment program 1990:National Center for Biotechnology Information 1705: 1655: 3081:European Conference on Computational Biology 2435: 1870:: CS1 maint: multiple names: authors list ( 3116:Research in Computational Molecular Biology 2071: 1845: 1587: 168:are represented using single-letter codes. 3093:International Conference on Bioinformatics 2674: 2660: 2568: 2486: 2119: 2117: 2065: 1486:Used generically to specify nucleic acids 810: 3087:Intelligent Systems for Molecular Biology 2600: 2520: 2510: 2469: 2459: 2337: 2247: 2198: 2157: 2097: 1793: 1741: 1731: 2402: 16:File format for DNA or protein sequences 3075:Basel Computational Biology Conference‎ 2174: 2114: 1980: 1763: 1761: 1699: 3274: 2297: 2264: 3032:International Society for Biocuration 2930:European Molecular Biology Laboratory 2655: 2396: 2362:"Description of A2M alignment format" 1983:"Single Letter Codes for Nucleotides" 1497:Contains coding regions for a genome 1443: 3254: 1758: 1706:Pearson WR, Lipman DJ (April 1988). 1649: 1519:Contains multiple protein sequences 3058:Japanese Society for Bioinformatics 2215: 2072:Pinho AJ, Pratas D (January 2014). 1914:Software Engineering Stack Exchange 313:local (i.e. no database reference) 278: 255: 13: 3020:European Molecular Biology network 2681: 2386:"soedinglab/hh-suite: reformat.pl" 206: 52:Uniform Type Identifier (UTI) 14: 3298: 3110:Pacific Symposium on Biocomputing 3014:Australia Bioinformatics Resource 2981:Swiss Institute of Bioinformatics 2964:Netherlands Bioinformatics Centre 2924:European Bioinformatics Institute 2620: 1494:FASTA nucleotide of gene regions 1092:neither T nor U (i.e. A, C or G) 3253: 3242: 3241: 2912:Database Center for Life Science 2900:Computational Biology Department 2788:Arabidopsis Information Resource 2421:10.1111/j.1471-8286.2007.01821.x 1691: 796:tr|Q90RT2|Q90RT2_9HIV1 2758:Specialised genomic databases: 2537: 2378: 2354: 2043: 2025: 2017:. NIAS DNA Bank. Archived from 2007: 1974: 1852:Now Go Bang! — mass:werk / Blog 1828:from the original on 2022-12-04 542:pgp|EP|0238993|7 516:pat|US|RE33188|1 43: 2959:Japanese Institute of Genetics 2142:10.1093/bioinformatics/btab102 1951: 1926: 1902: 1878: 1839: 1810: 1539: 1508:Contains amino acid sequences 490:sp|P01013|OVAX_CHICK 1: 2879:Rosalind (education platform) 2796:Zebrafish Information Network 2764:Saccharomyces Genome Database 2322:10.1093/bioinformatics/bty645 2090:10.1093/bioinformatics/btt594 1643: 1557: 1548: 1438: 225:Digital Equipment Corporation 3209:List of biological databases 2728:Protein Information Resource 2587:(Web Server issue): W465–9. 2556:. R package version 2.48.0. 2281:10.1007/978-3-319-60816-7_37 1432:gap of indeterminate length 1120:gap of indeterminate length 615:GenInfo integrated database 418:gb|M73307|AGMA13GT 268:the NCBI FASTA specification 7: 2702:European Nucleotide Archive 2562:10.18129/B9.bioc.Biostrings 2240:10.1093/gigascience/giaa072 1611: 1078:not G (i.e., A, C, T or U) 261:character. In the original 189: 10: 3303: 3287:Biological sequence format 1064:not C (i.e. A, G, T or U) 1050:not A (i.e. C, G, T or U) 572:general database reference 566:ref|NM_010450.1| 3237: 3191: 3125: 3067: 3000: 2987:Wellcome Sanger Institute 2941:J. Craig Venter Institute 2887: 2865: 2804: 2689: 2512:10.1186/s13104-018-3268-y 651:dbj|BAC85684.1| 442:emb|CAM43271.1| 364:GenInfo backbone moltype 116: 106: 92: 82: 74: 60: 50: 36: 24: 2970:Philippine Genome Center 1588:Working with FASTA files 603:gnl|taxon|9606 272: 241: 200: 160:for representing either 38:Internet media type 3214:Molecular phylogenetics 2710:China National GeneBank 2409:Molecular Ecology Notes 2051:"Alignment Fileformats" 1934:"FASTA Database Format" 1886:"VT220 Built-in Glyphs" 1818:"What is FASTA format?" 1678:10.1126/science.2983426 1475:Any generic FASTA file 811:Sequence representation 774:tpd|FAA00017| 749:tpe|BN000123| 724:tpg|BK003456| 675:prf||0806162C 608:gnl|PID|e1632 346:GenInfo backbone seqid 2918:DNA Data Bank of Japan 2706:DNA Data Bank of Japan 2581:Nucleic Acids Research 1981:Tao Tao (2011-08-24). 1774:Nucleic Acids Research 1733:10.1073/pnas.85.8.2444 3199:Computational biology 2714:Secondary databases: 2403:Villesen, P. (2007). 1960:NCBI C++ Toolkit Book 1448:There is no standard 466:pir||G36364 2696:Sequence databases: 699:pdb|1I4L|D 162:nucleotide sequences 2992:Whitehead Institute 2780:Rat Genome Database 2392:. 20 November 2022. 2200:10.3390/info7040056 1786:10.1093/nar/gkp1137 1724:1988PNAS...85.2444P 1670:1985Sci...227.1435L 1574:sequence alignments 1483:FASTA nucleic acid 211:The original FASTA/ 184:scripting languages 26:Filename extensions 21: 3229:Sequence alignment 2936:Flatiron Institute 2593:10.1093/nar/gkn180 2499:BMC Research Notes 2015:"IUPAC code table" 1516:FASTA amino acids 1450:filename extension 1444:Filename extension 1028:trong interaction 842:Nucleic Acid Code 825:sequence alignment 532:application-number 382:GenInfo import ID 93:Extended from 69:William R. Pearson 19: 3269: 3268: 3224:Sequence database 2738:Protein Data Bank 2732:Other databases: 2290:978-3-319-60815-0 2136:(16): 2225–2230. 1664:(4693): 1435–41. 1537: 1536: 1505:FASTA amino acid 1436: 1435: 1424:translation stop 1126: 1125: 817:protein sequences 801: 800: 522:pre-grant patent 142: 141: 61:Developed by 3294: 3257: 3256: 3245: 3244: 3204:List of biobanks 3168:Stockholm format 2976:Scripps Research 2676: 2669: 2662: 2653: 2652: 2615: 2614: 2604: 2572: 2566: 2565: 2554:Bioconductor.org 2541: 2535: 2534: 2524: 2514: 2490: 2484: 2483: 2473: 2463: 2439: 2433: 2432: 2400: 2394: 2393: 2382: 2376: 2375: 2370:. Archived from 2358: 2352: 2351: 2341: 2301: 2295: 2294: 2268: 2262: 2261: 2251: 2219: 2213: 2212: 2202: 2178: 2172: 2171: 2161: 2121: 2112: 2111: 2101: 2069: 2063: 2062: 2060: 2058: 2047: 2041: 2040: 2029: 2023: 2022: 2011: 2005: 2004: 2002: 2001: 1992:. Archived from 1978: 1972: 1971: 1969: 1968: 1955: 1949: 1948: 1946: 1945: 1930: 1924: 1923: 1921: 1920: 1906: 1900: 1899: 1897: 1896: 1882: 1876: 1875: 1869: 1861: 1859: 1858: 1843: 1837: 1836: 1834: 1833: 1814: 1808: 1807: 1797: 1765: 1756: 1755: 1745: 1735: 1703: 1697: 1696: 1695: 1689: 1653: 1583: 1579: 1455: 1454: 1134:Amino Acid Code 1131: 1130: 1042:eak interaction 990:bases which are 839: 838: 805:Backus–Naur form 797: 792: 775: 770: 750: 745: 725: 720: 700: 695: 676: 671: 652: 647: 628: 627:gi|21434723 623: 609: 604: 598: 586: 567: 562: 543: 538: 517: 512: 491: 486: 467: 462: 443: 438: 419: 414: 395: 390: 377: 372: 359: 354: 340: 335: 329: 321: 299: 298: 295: 291: 279:NCBI identifiers 256:Description line 156:is a text-based 138: 135: 133: 131: 129: 127: 125: 123: 107:Extended to 45: 22: 18: 3302: 3301: 3297: 3296: 3295: 3293: 3292: 3291: 3272: 3271: 3270: 3265: 3233: 3187: 3121: 3063: 3044:Student Council 2996: 2895:Broad Institute 2883: 2861: 2800: 2685: 2680: 2623: 2618: 2573: 2569: 2542: 2538: 2491: 2487: 2440: 2436: 2401: 2397: 2384: 2383: 2379: 2360: 2359: 2355: 2302: 2298: 2291: 2269: 2265: 2220: 2216: 2179: 2175: 2122: 2115: 2070: 2066: 2056: 2054: 2049: 2048: 2044: 2031: 2030: 2026: 2013: 2012: 2008: 1999: 1997: 1979: 1975: 1966: 1964: 1957: 1956: 1952: 1943: 1941: 1932: 1931: 1927: 1918: 1916: 1908: 1907: 1903: 1894: 1892: 1884: 1883: 1879: 1863: 1862: 1856: 1854: 1844: 1840: 1831: 1829: 1816: 1815: 1811: 1766: 1759: 1704: 1700: 1690: 1654: 1650: 1646: 1614: 1590: 1581: 1577: 1560: 1551: 1542: 1469:fasta, fas, fa 1446: 1441: 945:(non-standard) 813: 795: 783: 773: 761: 748: 736: 723: 711: 698: 686: 674: 662: 650: 638: 626: 618: 607: 605: 602: 589: 587: 577: 573: 565: 553: 541: 536:sequence-number 525: 515: 510:sequence-number 499: 489: 477: 465: 453: 441: 429: 417: 405: 393: 385: 375: 367: 357: 349: 339:lcl|hmm271 338: 336: 333: 324: 322: 316: 293: 289: 281: 276: 275: 258: 245: 244: 209: 207:Original format 204: 203: 192: 120: 75:Initial release 67: 65:David J. Lipman 46: 32: 17: 12: 11: 5: 3300: 3290: 3289: 3284: 3282:Bioinformatics 3267: 3266: 3264: 3263: 3251: 3238: 3235: 3234: 3232: 3231: 3226: 3221: 3216: 3211: 3206: 3201: 3195: 3193: 3192:Related topics 3189: 3188: 3186: 3185: 3180: 3175: 3170: 3165: 3160: 3155: 3150: 3145: 3140: 3135: 3129: 3127: 3123: 3122: 3120: 3119: 3113: 3107: 3102: 3096: 3090: 3084: 3078: 3071: 3069: 3065: 3064: 3062: 3061: 3055: 3049: 3048: 3047: 3035: 3029: 3023: 3017: 3011: 3004: 3002: 2998: 2997: 2995: 2994: 2989: 2984: 2978: 2973: 2967: 2961: 2956: 2950: 2944: 2938: 2933: 2927: 2921: 2915: 2909: 2903: 2897: 2891: 2889: 2885: 2884: 2882: 2881: 2876: 2869: 2867: 2863: 2862: 2860: 2859: 2854: 2849: 2844: 2839: 2834: 2829: 2824: 2819: 2814: 2808: 2806: 2802: 2801: 2799: 2798: 2756: 2730: 2712: 2693: 2691: 2687: 2686: 2683:Bioinformatics 2679: 2678: 2671: 2664: 2656: 2650: 2649: 2644: 2639: 2637:FigTree viewer 2634: 2629: 2622: 2621:External links 2619: 2617: 2616: 2567: 2536: 2485: 2461:10.1186/gb4161 2448:Genome Biology 2434: 2415:(6): 965–968. 2395: 2377: 2374:on 2022-08-15. 2353: 2316:(1): 146–148. 2310:Bioinformatics 2296: 2289: 2263: 2234:(7): giaa072. 2214: 2173: 2130:Bioinformatics 2113: 2078:Bioinformatics 2064: 2042: 2024: 2021:on 2011-08-11. 2006: 1973: 1950: 1925: 1901: 1877: 1838: 1809: 1780:(6): 1767–71. 1757: 1698: 1647: 1645: 1642: 1641: 1640: 1633: 1622: 1613: 1610: 1589: 1586: 1569:in Cambridge. 1559: 1556: 1550: 1547: 1541: 1538: 1535: 1534: 1531: 1529:non-coding RNA 1525: 1521: 1520: 1517: 1514: 1510: 1509: 1506: 1503: 1499: 1498: 1495: 1492: 1488: 1487: 1484: 1481: 1477: 1476: 1473: 1472:generic FASTA 1470: 1466: 1465: 1462: 1459: 1445: 1442: 1440: 1437: 1434: 1433: 1430: 1426: 1425: 1422: 1418: 1417: 1414: 1410: 1409: 1399: 1395: 1394: 1389: 1385: 1384: 1379: 1375: 1374: 1369: 1365: 1364: 1361:Selenocysteine 1358: 1354: 1353: 1348: 1344: 1343: 1338: 1334: 1333: 1328: 1324: 1323: 1318: 1314: 1313: 1308: 1304: 1303: 1297: 1293: 1292: 1287: 1283: 1282: 1273: 1269: 1268: 1263: 1259: 1258: 1253: 1249: 1248: 1238: 1234: 1233: 1228: 1224: 1223: 1218: 1214: 1213: 1208: 1204: 1203: 1198: 1194: 1193: 1188: 1184: 1183: 1178: 1174: 1173: 1168: 1164: 1163: 1153: 1149: 1148: 1143: 1139: 1138: 1135: 1124: 1123: 1121: 1118: 1114: 1113: 1107: 1104: 1100: 1099: 1098:comes after U 1093: 1090: 1086: 1085: 1084:comes after G 1079: 1076: 1072: 1071: 1070:comes after C 1065: 1062: 1058: 1057: 1056:comes after A 1051: 1048: 1044: 1043: 1037: 1034: 1030: 1029: 1023: 1020: 1016: 1015: 1005: 1002: 998: 997: 988: 985: 981: 980: 971: 968: 964: 963: 954: 951: 947: 946: 937: 934: 930: 929: 921: 918: 914: 913: 905: 902: 898: 897: 889: 886: 882: 881: 873: 870: 866: 865: 857: 854: 850: 849: 846: 843: 812: 809: 799: 798: 793: 781: 777: 776: 771: 759: 752: 751: 746: 734: 727: 726: 721: 709: 702: 701: 696: 684: 678: 677: 672: 660: 654: 653: 648: 636: 630: 629: 624: 616: 612: 611: 600: 575: 569: 568: 563: 551: 545: 544: 539: 523: 519: 518: 513: 497: 493: 492: 487: 475: 469: 468: 463: 451: 445: 444: 439: 427: 421: 420: 415: 403: 397: 396: 391: 383: 379: 378: 373: 365: 361: 360: 355: 347: 343: 342: 331: 314: 310: 309: 306: 303: 280: 277: 273: 257: 254: 242: 208: 205: 201: 191: 188: 177:bioinformatics 146:bioinformatics 140: 139: 118: 114: 113: 108: 104: 103: 94: 90: 89: 87:Bioinformatics 84: 83:Type of format 80: 79: 76: 72: 71: 62: 58: 57: 54: 48: 47: 42: 40: 34: 33: 30: 28: 15: 9: 6: 4: 3: 2: 3299: 3288: 3285: 3283: 3280: 3279: 3277: 3262: 3261: 3252: 3250: 3249: 3240: 3239: 3236: 3230: 3227: 3225: 3222: 3220: 3217: 3215: 3212: 3210: 3207: 3205: 3202: 3200: 3197: 3196: 3194: 3190: 3184: 3181: 3179: 3176: 3174: 3171: 3169: 3166: 3164: 3161: 3159: 3158:Pileup format 3156: 3154: 3151: 3149: 3146: 3144: 3141: 3139: 3136: 3134: 3131: 3130: 3128: 3124: 3117: 3114: 3111: 3108: 3106: 3103: 3100: 3097: 3094: 3091: 3088: 3085: 3082: 3079: 3076: 3073: 3072: 3070: 3066: 3059: 3056: 3053: 3050: 3045: 3042: 3041: 3039: 3036: 3033: 3030: 3027: 3024: 3021: 3018: 3015: 3012: 3009: 3006: 3005: 3003: 3001:Organizations 2999: 2993: 2990: 2988: 2985: 2982: 2979: 2977: 2974: 2971: 2968: 2965: 2962: 2960: 2957: 2954: 2951: 2948: 2945: 2942: 2939: 2937: 2934: 2931: 2928: 2925: 2922: 2919: 2916: 2913: 2910: 2907: 2904: 2901: 2898: 2896: 2893: 2892: 2890: 2886: 2880: 2877: 2875: 2871: 2870: 2868: 2864: 2858: 2855: 2853: 2850: 2848: 2845: 2843: 2840: 2838: 2835: 2833: 2830: 2828: 2825: 2823: 2820: 2818: 2815: 2813: 2810: 2809: 2807: 2803: 2797: 2793: 2789: 2785: 2781: 2777: 2773: 2769: 2765: 2761: 2757: 2755: 2754:Gene Ontology 2751: 2747: 2743: 2739: 2735: 2731: 2729: 2725: 2721: 2717: 2713: 2711: 2707: 2703: 2699: 2695: 2694: 2692: 2688: 2684: 2677: 2672: 2670: 2665: 2663: 2658: 2657: 2654: 2648: 2645: 2643: 2640: 2638: 2635: 2633: 2632:FASTX-Toolkit 2630: 2628: 2625: 2624: 2612: 2608: 2603: 2598: 2594: 2590: 2586: 2582: 2578: 2571: 2563: 2559: 2555: 2551: 2549: 2540: 2532: 2528: 2523: 2518: 2513: 2508: 2504: 2500: 2496: 2489: 2481: 2477: 2472: 2467: 2462: 2457: 2453: 2449: 2445: 2438: 2430: 2426: 2422: 2418: 2414: 2410: 2406: 2399: 2391: 2387: 2381: 2373: 2369: 2368: 2363: 2357: 2349: 2345: 2340: 2335: 2331: 2327: 2323: 2319: 2315: 2311: 2307: 2300: 2292: 2286: 2282: 2278: 2274: 2267: 2259: 2255: 2250: 2245: 2241: 2237: 2233: 2229: 2225: 2218: 2210: 2206: 2201: 2196: 2192: 2188: 2184: 2177: 2169: 2165: 2160: 2155: 2151: 2147: 2143: 2139: 2135: 2131: 2127: 2120: 2118: 2109: 2105: 2100: 2095: 2091: 2087: 2083: 2079: 2075: 2068: 2053:. 22 May 2019 2052: 2046: 2038: 2034: 2028: 2020: 2016: 2010: 1996:on 2012-09-14 1995: 1991: 1987: 1984: 1977: 1962: 1961: 1954: 1939: 1935: 1929: 1915: 1911: 1905: 1891: 1887: 1881: 1873: 1867: 1853: 1849: 1842: 1827: 1823: 1819: 1813: 1805: 1801: 1796: 1791: 1787: 1783: 1779: 1775: 1771: 1764: 1762: 1753: 1749: 1744: 1739: 1734: 1729: 1725: 1721: 1718:(8): 2444–8. 1717: 1713: 1709: 1702: 1694: 1687: 1683: 1679: 1675: 1671: 1667: 1663: 1659: 1652: 1648: 1638: 1634: 1631: 1627: 1623: 1620: 1616: 1615: 1609: 1605: 1603: 1599: 1596: 1585: 1575: 1570: 1568: 1567:Sanger Centre 1564: 1555: 1546: 1532: 1530: 1526: 1523: 1522: 1518: 1515: 1512: 1511: 1507: 1504: 1501: 1500: 1496: 1493: 1490: 1489: 1485: 1482: 1479: 1478: 1474: 1471: 1468: 1467: 1463: 1460: 1457: 1456: 1453: 1451: 1431: 1428: 1427: 1423: 1420: 1419: 1415: 1412: 1411: 1407: 1403: 1402:Glutamic acid 1400: 1397: 1396: 1393: 1390: 1387: 1386: 1383: 1380: 1377: 1376: 1373: 1370: 1367: 1366: 1362: 1359: 1356: 1355: 1352: 1349: 1346: 1345: 1342: 1339: 1336: 1335: 1332: 1329: 1326: 1325: 1322: 1319: 1316: 1315: 1312: 1309: 1306: 1305: 1301: 1298: 1295: 1294: 1291: 1288: 1285: 1284: 1281: 1277: 1274: 1271: 1270: 1267: 1264: 1261: 1260: 1257: 1254: 1251: 1250: 1246: 1242: 1239: 1236: 1235: 1232: 1229: 1226: 1225: 1222: 1219: 1216: 1215: 1212: 1209: 1206: 1205: 1202: 1201:Phenylalanine 1199: 1196: 1195: 1192: 1191:Glutamic acid 1189: 1186: 1185: 1182: 1181:Aspartic acid 1179: 1176: 1175: 1172: 1169: 1166: 1165: 1161: 1157: 1156:Aspartic acid 1154: 1151: 1150: 1147: 1144: 1141: 1140: 1136: 1133: 1132: 1129: 1122: 1119: 1116: 1115: 1111: 1108: 1105: 1102: 1101: 1097: 1094: 1091: 1088: 1087: 1083: 1080: 1077: 1074: 1073: 1069: 1066: 1063: 1060: 1059: 1055: 1052: 1049: 1046: 1045: 1041: 1038: 1035: 1032: 1031: 1027: 1024: 1021: 1018: 1017: 1014: 1012: 1006: 1003: 1000: 999: 996: 994: 989: 986: 983: 982: 979: 977: 972: 969: 966: 965: 962: 960: 955: 952: 949: 948: 944: 942: 938: 935: 932: 931: 928: 926: 922: 919: 916: 915: 912: 910: 906: 903: 900: 899: 896: 894: 890: 887: 884: 883: 880: 878: 874: 871: 868: 867: 864: 862: 858: 855: 852: 851: 847: 844: 841: 840: 837: 834: 830: 826: 822: 818: 808: 806: 794: 791: 787: 782: 779: 778: 772: 769: 765: 760: 758: 754: 753: 747: 744: 740: 735: 733: 729: 728: 722: 719: 715: 710: 708: 704: 703: 697: 694: 690: 685: 683: 680: 679: 673: 670: 666: 661: 659: 656: 655: 649: 646: 642: 637: 635: 632: 631: 625: 622: 617: 614: 613: 610: 601: 599: 597: 593: 585: 581: 576: 571: 570: 564: 561: 557: 552: 550: 547: 546: 540: 537: 533: 529: 524: 521: 520: 514: 511: 507: 503: 498: 495: 494: 488: 485: 481: 476: 474: 471: 470: 464: 461: 457: 452: 450: 447: 446: 440: 437: 433: 428: 426: 423: 422: 416: 413: 409: 404: 402: 399: 398: 392: 389: 384: 381: 380: 374: 371: 366: 363: 362: 356: 353: 348: 345: 344: 341: 332: 330: 328: 320: 315: 312: 311: 307: 304: 301: 300: 297: 286: 271: 269: 264: 253: 249: 240: 236: 232: 230: 226: 220: 218: 214: 199: 198:For example: 196: 187: 185: 180: 178: 174: 169: 167: 163: 159: 155: 151: 147: 137: 119: 115: 112: 109: 105: 102: 98: 95: 91: 88: 85: 81: 77: 73: 70: 66: 63: 59: 55: 53: 49: 41: 39: 35: 29: 27: 23: 3258: 3246: 3153:Nexus format 3148:NeXML format 3143:FASTQ format 3138:FASTA format 3137: 3126:File formats 2888:Institutions 2642:Phylogeny.fr 2627:Bioconductor 2584: 2580: 2570: 2553: 2547: 2539: 2502: 2498: 2488: 2451: 2447: 2437: 2412: 2408: 2398: 2389: 2380: 2372:the original 2365: 2356: 2313: 2309: 2299: 2272: 2266: 2231: 2227: 2217: 2190: 2186: 2176: 2133: 2129: 2084:(1): 117–8. 2081: 2077: 2067: 2055:. Retrieved 2045: 2036: 2027: 2019:the original 2009: 1998:. Retrieved 1994:the original 1986: 1976: 1965:. Retrieved 1959: 1953: 1942:. Retrieved 1940:. 2023-08-01 1937: 1928: 1917:. Retrieved 1913: 1904: 1893:. Retrieved 1889: 1880: 1855:. Retrieved 1851: 1841: 1830:. Retrieved 1821: 1812: 1777: 1773: 1715: 1711: 1701: 1661: 1657: 1651: 1619:FASTQ format 1606: 1597: 1595:Bioconductor 1591: 1571: 1563:FASTQ format 1561: 1552: 1543: 1447: 1127: 1112:ucleic acid 1109: 1095: 1081: 1067: 1053: 1039: 1025: 1010: 992: 975: 958: 940: 924: 908: 892: 876: 860: 833:nucleic acid 821:nucleic acid 814: 802: 789: 785: 767: 763: 755:third-party 742: 738: 730:third-party 717: 713: 705:third-party 692: 688: 668: 664: 644: 640: 620: 606: 595: 591: 588: 583: 579: 559: 555: 535: 531: 527: 509: 505: 501: 483: 479: 459: 455: 435: 431: 411: 407: 394:gim|123 387: 376:bbm|123 369: 358:bbs|123 351: 337: 334:lcl|123 326: 323: 318: 282: 259: 250: 246: 237: 233: 221: 210: 197: 193: 181: 170: 154:FASTA format 153: 150:biochemistry 143: 111:FASTQ format 44:text/x-fasta 20:FASTA format 3133:CRAM format 3054:(CSIR-IGIB) 2228:GigaScience 2187:Information 2033:"anysymbol" 1938:www.loc.gov 1540:Compression 1300:Pyrrolysine 1280:Start codon 1007:bases with 953:A or G (I) 308:Example(s) 290:makeblastdb 195:in length. 166:amino acids 3276:Categories 3219:Sequencing 3183:GTF format 3178:GFF format 3173:VCF format 3163:SAM format 2926:(EMBL-EBI) 2852:SOAP suite 2772:VectorBase 2734:BioNumbers 2720:Swiss-Prot 2505:(1): 403. 2454:(2): 403. 2000:2012-03-15 1967:2018-12-19 1944:2024-03-15 1919:2024-03-15 1895:2024-03-15 1857:2024-03-15 1832:2022-12-04 1644:References 1598:Biostrings 1558:Extensions 1549:Encryption 1458:Extension 1439:FASTA file 1382:Tryptophan 1290:Asparagine 1276:Methionine 1245:Isoleucine 1231:Isoleucine 1160:Asparagine 1106:A C G T U 1036:A, T or U 1013:ino groups 987:G, T or U 970:C, T or U 829:amino acid 473:SWISS-PROT 305:Format(s) 3046:(ISCB-SC) 3016:(EMBL-AR) 2949:(MPI-CBG) 2690:Databases 2429:1471-8278 2330:1367-4803 2209:2078-2489 2193:(4): 56. 2150:1367-4803 1822:Zhang Lab 1406:Glutamine 1351:Threonine 1321:Glutamine 1221:Histidine 978:rimidines 848:Mnemonic 786:accession 764:accession 762:tpd| 739:accession 737:tpe| 714:accession 712:tpg| 687:pdb| 665:accession 663:prf| 641:accession 639:dbj| 590:gnl| 578:gnl| 556:accession 554:ref| 526:pgp| 500:pat| 480:accession 456:accession 454:pir| 432:accession 430:emb| 408:accession 386:gim| 368:bbm| 350:bbs| 325:lcl| 317:lcl| 294:table2asn 3248:Category 3118:(RECOMB) 3068:Meetings 3022:(EMBnet) 2872:Server: 2847:SAMtools 2842:PANGOLIN 2805:Software 2784:PHI-base 2776:WormBase 2746:InterPro 2611:18424797 2531:29506565 2480:25001293 2367:SAMtools 2348:30020420 2258:32627830 2168:33585897 2108:24132931 1866:cite web 1826:Archived 1804:20015970 1612:See also 1461:Meaning 1392:Tyrosine 1331:Arginine 1171:Cysteine 1137:Meaning 845:Meaning 784:tr| 619:gi| 592:database 580:database 478:sp| 406:gb| 190:Overview 3260:Commons 3095:(InCoB) 3040:(ISCB) 3028:(INSDC) 3010:(ASBCB) 2914:(DBCLS) 2908:(COSBI) 2822:Clustal 2768:FlyBase 2742:Ensembl 2716:UniProt 2698:GenBank 2602:2447785 2522:5838971 2471:4038738 2339:6298042 2249:7336184 2159:8388020 2099:3866555 1795:2847217 1752:3162770 1720:Bibcode 1686:2983426 1666:Bibcode 1658:Science 1639:format. 1404:(E) or 1363:(rare) 1311:Proline 1302:(rare) 1266:Leucine 1243:(L) or 1241:Leucine 1211:Glycine 1158:(D) or 1146:Alanine 1022:C or G 1004:A or C 879:ytosine 780:TrEMBL 707:GenBank 621:integer 584:integer 528:country 502:country 496:patent 401:GenBank 388:integer 370:integer 352:integer 319:integer 263:Pearson 213:Pearson 117:Website 3101:(CIBB) 3089:(ISMB) 3083:(ECCB) 3060:(JSBi) 2966:(NBIC) 2955:(NCBI) 2943:(JCVI) 2932:(EMBL) 2920:(DDBJ) 2874:ExPASy 2857:TopHat 2837:MUSCLE 2827:EMBOSS 2817:Bowtie 2792:GISAID 2752:, and 2724:TrEMBL 2609:  2599:  2529:  2519:  2478:  2468:  2427:  2390:GitHub 2346:  2336:  2328:  2287:  2256:  2246:  2207:  2166:  2156:  2148:  2106:  2096:  2057:22 May 1802:  1792:  1750:  1743:280013 1740:  1684:  1527:FASTA 1464:Notes 1372:Valine 1341:Serine 1256:Lysine 995:etones 943:nosine 911:hymine 895:uanine 863:denine 788:| 766:| 741:| 716:| 691:| 667:| 643:| 596:string 594:| 582:| 558:| 549:RefSeq 534:| 530:| 508:| 506:patent 504:| 482:| 458:| 434:| 410:| 327:string 227:(DEC) 158:format 152:, the 136:.shtml 134:/fasta 132:/BLAST 3112:(PSB) 3034:(ISB) 2983:(SIB) 2972:(PGC) 2902:(CBD) 2866:Other 2832:HMMER 2812:BLAST 1890:VT100 1513:mpfa 927:racil 693:chain 689:entry 645:locus 436:locus 412:locus 302:Type 229:VT220 217:FASTA 173:FASTA 124:.ncbi 101:FASTA 97:ASCII 2794:and 2760:BOLD 2750:KEGG 2726:and 2708:and 2607:PMID 2527:PMID 2476:PMID 2425:ISSN 2344:PMID 2326:ISSN 2285:ISBN 2254:PMID 2205:ISSN 2164:PMID 2146:ISSN 2104:PMID 2059:2019 1872:link 1800:PMID 1748:PMID 1682:PMID 1637:GFF3 1630:CRAM 1628:and 1624:The 1617:The 1524:frn 1502:faa 1491:ffn 1480:fna 1416:any 1408:(Q) 1247:(I) 1162:(N) 933:(i) 831:and 790:name 768:name 757:DDBJ 743:name 732:EMBL 718:name 669:name 634:DDBJ 560:name 484:name 460:name 425:EMBL 292:and 285:NCBI 283:The 148:and 130:.gov 128:.nih 126:.nlm 99:for 78:1985 2647:GTO 2597:PMC 2589:doi 2558:doi 2517:PMC 2507:doi 2466:PMC 2456:doi 2417:doi 2334:PMC 2318:doi 2277:doi 2244:PMC 2236:doi 2195:doi 2154:PMC 2138:doi 2094:PMC 2086:doi 1790:PMC 1782:doi 1738:PMC 1728:doi 1674:doi 1662:227 1626:SAM 961:ine 819:or 682:PDB 658:PRF 449:PIR 144:In 122:www 3278:: 3077:() 2790:, 2786:, 2782:, 2778:, 2774:, 2770:, 2766:, 2762:, 2748:, 2744:, 2740:, 2736:, 2722:, 2704:, 2700:, 2605:. 2595:. 2585:36 2583:. 2579:. 2552:. 2525:. 2515:. 2503:11 2501:. 2497:. 2474:. 2464:. 2452:15 2450:. 2446:. 2423:. 2411:. 2407:. 2388:. 2364:. 2342:. 2332:. 2324:. 2314:35 2312:. 2308:. 2283:. 2252:. 2242:. 2230:. 2226:. 2203:. 2189:. 2185:. 2162:. 2152:. 2144:. 2134:37 2132:. 2128:. 2116:^ 2102:. 2092:. 2082:30 2080:. 2076:. 2035:. 1988:. 1985:. 1936:. 1912:. 1888:. 1868:}} 1864:{{ 1850:. 1824:. 1820:. 1798:. 1788:. 1778:38 1776:. 1772:. 1760:^ 1746:. 1736:. 1726:. 1716:85 1714:. 1710:. 1680:. 1672:. 1660:. 1604:. 1429:- 1421:* 1413:X 1398:Z 1388:Y 1378:W 1368:V 1357:U 1347:T 1337:S 1327:R 1317:Q 1307:P 1296:O 1286:N 1272:M 1262:L 1252:K 1237:J 1227:I 1217:H 1207:G 1197:F 1187:E 1177:D 1167:C 1152:B 1142:A 1117:- 1103:N 1089:V 1075:H 1061:D 1047:B 1033:W 1019:S 1001:M 984:K 967:Y 957:pu 950:R 936:i 920:U 917:U 904:T 901:T 888:G 885:G 872:C 869:C 856:A 853:A 186:. 179:. 56:no 2675:e 2668:t 2661:v 2613:. 2591:: 2564:. 2560:: 2550:" 2546:" 2533:. 2509:: 2482:. 2458:: 2431:. 2419:: 2413:7 2350:. 2320:: 2293:. 2279:: 2260:. 2238:: 2232:9 2211:. 2197:: 2191:7 2170:. 2140:: 2110:. 2088:: 2061:. 2039:. 2003:. 1970:. 1947:. 1922:. 1898:. 1874:) 1860:. 1835:. 1806:. 1784:: 1754:. 1730:: 1722:: 1688:. 1676:: 1668:: 1602:R 1582:- 1578:. 1278:/ 1110:N 1096:V 1082:H 1068:D 1054:B 1040:W 1026:S 1011:M 1009:a 993:K 976:Y 974:p 959:R 941:i 925:U 909:T 893:G 877:C 861:A

Index

Filename extensions
Internet media type
Uniform Type Identifier (UTI)
David J. Lipman
William R. Pearson
Bioinformatics
ASCII
FASTA
FASTQ format
www.ncbi.nlm.nih.gov/BLAST/fasta.shtml
bioinformatics
biochemistry
format
nucleotide sequences
amino acids
FASTA
bioinformatics
scripting languages
Pearson
FASTA
Digital Equipment Corporation
VT220
Pearson
the NCBI FASTA specification
NCBI
GenBank
EMBL
PIR
SWISS-PROT
RefSeq

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

↑