FASTA format description


A sequence in FASTA format begins with a single-line description, followed by lines of sequence data. The description line is distinguished from the sequence data by a greater-than (">") symbol in the first column. It is recommended that all lines of text be shorter than 80 characters in length.
An example sequence in FASTA format is:

>sp|P08988|AAC4_SALSP AMINOGLYCOSIDE N3'-ACETYLTRANSFERASE IV (EC 2.3.1.81)
MQYEWRKAELIGQLLNLGVTPGGVLLVHSSFRSVRPLEDGPLGLIEALRAALGPGGTLVMPSWSGLDDEPFDPATSPVTP
DLGVVSDTFWRLPNVKRSAHPFAFAAAGPQAEQIISDPLPLPPHSPASPVARVHELDGQVLLLGVGHDANTTLHLAELMA
KVPYGVPRHCTILQDGKLVRVDYLENDHCCERFALADRWLKEKSLQKEGPVGHAFARLIRSRDIVATALGQLGRDPLIFL
HPPEGGMRRMRCRSPVDWLSS

Sequences are expected to be represented in the standard IUB/IUPAC amino acid codes, with these exceptions: a single hyphen or dash can be used to represent a gap of indeterminate length; and in amino acid sequences, U and * are acceptable letters (see below). Before submitting a request, any numerical digits in the query sequence should either be removed or replaced by appropriate letter codes (e.g., N for unknown nucleic acid residue or X for unknown amino acid residue).
The accepted amino acid codes are:

    A  alanine                         P  proline
    B  aspartate or asparagine         Q  glutamine
    C  cystine                         R  arginine
    D  aspartate                       S  serine
    E  glutamate                       T  threonine
    F  phenylalanine                   U  selenocysteine
    G  glycine                         V  valine
    H  histidine                       W  tryptophan
    I  isoleucine                      Y  tyrosine
    K  lysine                          Z  glutamate or glutamine
    L  leucine                         X  any
    M  methionine                      *  translation stop
    N  asparagine                      -  gap of indeterminate length