Feasibility of a whole-genome motif search?
Transcription control sites
(~7 bases of information)
- 7 bases of information (14 bits) ~ 1 match every 16000 sites.
- 1500 such matches in a 12 Mb genome (24 * 106 sites).
- The distribution of numbers of sites for different motifs is Poisson with mean 1500, which can be approximated as normal with a mean of 1500 and a standard deviation of ~40 sites.
- Therefore, ~100 sites are needed to achieve a detectable signal above background.