Finding Frameshifts

Shifts in reading frame due to insertion or deletion errors in a DNA sequence are the bane of many a sequencer. A number of methods and tools have been developed which can find or are designed to find frameshifts.

DETECT

DETECT is a program which finds frameshifts by searching a nucleotide sequence against a protein database using a PNAS 89:4698)

TBLASTN / TFASTA

Both the BLAST and FASTA families of search programs contain programs which will search a protein query against a nucleotide database by translating the nucleotide database in all six reading frames.

BLASTX

BLASTX, which compares a nucleotide query against a protein database, can be used to identify frameshifts.

Darwin

Darwin is a sequence analysis environment developed by Gaston Gonnet and coworkers. Among the analysis functions is a full dynamic programming search between a nucleotide query and a protein database which can find frameshifts.

States, DJ., & Botstein, D.

Molecular sequence accuracy and the analysis of protein coding regions. PNAS 88:5518 1991
A discussion of methods and limits in finding frameshifted reading frames.

Specialized Matrices

Claverie (J.Mol.Biol 234:1140, 1993) has developed a set of substitution matrices designed explicitly for finding possible frameshifts in protein sequences. These matrices are designed solely for use in protein-protein comparisons; they should not be used with programs which blindly translate DNA (e.g. BLASTX, TBLASTN).

Codon Usage Methods

In theory, codon usage methods can be used to identify frameshifts. In practice, no one has developed an automated approach to this, although the graphical output of many programs can be useful in this regard.