A pairwise sequence alignment from a BLAST report
The alignment is preceded by the sequence identifier, the full definition line, and the length ofthe matched sequence, in amino acids. Next comes the bit score (the raw score is inparentheses) and then the E-value. The following line contains information on the number ofidentical residues in this alignment (Identities), the number of conservative substitutions (Positives), and if applicable, the number of gaps in the alignment. Finally, the actual alignment is shown, with the query on top, and the database match is labeled as Sbjct, below. The numbersat left and right refer to the position in the amino acid sequence. One or more dashes (–) withina sequence indicate insertions or deletions. Amino acid residues in the query sequence thathave been masked because of low complexity are replaced by Xs (see, for example, the fourth and last blocks). The line between the two sequences indicates the similarities between the sequences. If the query and the subject have the same amino acid at a given location, theresidue itself is shown. Conservative substitutions, as judged by the substitution matrix, areindicated with +.
The traditional report is really designed for human readability, as opposed to being parsed bya program. For example, the one-line descriptions are useful for people to get a quick overview of their search results, but they are rarely complete descriptors because of limited space. Also, for convenience, there are several pieces of information that are displayed in both the one-line descriptions and alignments (for example, the E-values, scores, and descriptions); therefore,the person viewing the search output does not need to move back and forth between sections. New features may be added to the report, e.g., the addition of links to Entrez Gene records(Chapter 19) from sequence hits, which result in a change of output format. These are easy forpeople to pick up on and take advantage of but can trip programs that parse this BLAST output.