Example of upstream region prediction. If the gene lies within an operon, its promoter could lie several genes upstream. Thus, we must include several possible intergenic regions. Here an operon is defined as two tandem genes separated by less than a certain cutoff distance. We include up to 300 bp of noncoding sequence directly upstream of the head of the predicted operon, as well as the entire sequence of all of intergenic segments of length > 20 bp between the gene of interest and the operon head. This figure shows the predicted upstream region for gene F.
First the algorithm checks the length of the upstream region for the gene in question. If an intergenic region is shorter than the distance cutoff, then the entire intergenic region is stored for motif-finding and thenext intergenic region further upstream is considered as well. This continues until an intergenic region is encountered that is eitehr divergently transcribed, or longer than the distance cutoff.
Parameters:
- Maximum distance between genes in the same operon: two tandem genes separated by less than this distance will be considered part of the same "operon" and upstream region prediction will continue to through upstream of both genes.
- Minimum sequence to save upstream of each gene: if two tandem genes are separated by less than this minimum distance, then that sequence segment will be ingored, and the algorithm will continue on to the next gene upstream of this one.
- Maximum sequence to save usptream of each gene: if two genes are separated by more than this distance, then only this maximum distance will be taken.
The upstream region extraction program also has an option for "no operon prediction". The only parameter that you need in this case is the exact length to take upstream of each gene. This is useful if you simply want to know the upstream region of a particular gene or if you already know the operon structure.