A number of sequence pre-filters have been developed to aid in sequence analysis.
Reducing matches due to biased amino acid composition
- Many amino acid sequences are highly repetitive in nature, especially naive translations of genomic DNA. Matches between such segments are more likely to be due to these local amino acid composition biases than to common descent. Filters have been developed to mask out regions showing highly-biased local composition.
XNU & SEG have been integrated into the network BLAST server, but there is little about their operation which would preclude using them with other programs.
- (Wooton & Federhen, Computers & Chemistry 17:149. 1993)
- (Claverie & States, Computers & Chemistry, 17:191. 1993)
Reducing matches to "uninteresting" sequences
- XBLAST (not to be confused with BLASTX) is program which masks sequences from a query using a previous BLAST output as a guide (Claverie & States, Computers & Chemistry, 17:191. 1993). In other words, given a sequence and a BLAST of that sequences, XBLAST outputs the sequence with all matches from the BLAST report masked by ambiguity characters (X for proteins; N for nucleotides). This can greatly improve the readability of BLAST reports by removing uninformative or confusing matches. For example, suppose you have just sequenced 20Kb of human DNA. That DNA is likely to contain various repetitive sequences, such as Alu elements. A BLASTN search will contain many hits involving Alu elements, which might obscure more interesting hits involving other similarities. Hence, a wise sequence of searches would be
- BLASTN search versus Human Repetitive Sequences (Rep)
- XBLAST of query using Rep
- BLASTN search of GenBank (or dbEST) using XBLAST-processed query
This document is intended to serve as a guide to using certain bioinformatics
programs. It cannot be guaranteed to be free of errors or completely up-to-date. If you know of errors or other shortcomings of this document, please mail them
to Keith Robison