Documenting your search

Most scientists would agree that it is important to document your work so that another scientist can replicate it, a tradition begun by Louis Pasteur. Unfortunately, many published sequence comparisons do not meet this test. Here are the parameters which should be explicitly and clearly stated.
Algorithm(s)
Substitution matrix
All modern search programs use substitution matrices. The choice of substitution matrix can greatly affect search results; therefore it is imperative to document which matrix (or matrices) were used in searching and aligning.
Gap penalty
For algorithms which use gap penalties (such as FASTA), it is critical to state the gap penalty used.
Name of database
Specify your database explicitly (SwissProt, PIR, GenBank, EMBL, dbEST), not by type (nucleotide, protein, sequence).
Version of database
Databases are changing very rapidly, much faster than the publication cycle and frequently faster than local system adminstrators can handle. It is therefore critical to state the version of the database used. If searching a constantly updated database, then the date of the last search should be stated.
Computer used
This is actually the least important parameter to state, as the same algorithm with the same database and parameters should produce the same results on every machine. However, if an off-site computer system were used (such as an E-mail or Internet server), then it is standard scientific courtesy to identify the server and credit its maintainers.

This page represents the personal opinions of the author, who does not claim to be without sin in these matters. He would be happy to discuss any of these points publically or privately.

Keith Robison

KRobison@nucleus.harvard.edu