Sequence-specific DNA-binding proteins perform a variety of roles in the cell, including transcriptional regulation. Known binding sites for a DNA-binding protein can be used to identify additional sites for that protein, and thereby identify further genes regulated by that protein. The availability of complete bacterial genome sequences offers new opportunities to describe networks of regulatory interactions. Of the 240-260 candidate E. coli DNA-binding proteins, only around 59 have binding sites identified by DNA footprinting. We used these sites to construct recognition matrices (based on data in the DPInteract database) which we used to search for additional binding sites in the E .coli genomic sequence. We used the matrix scoring method of Berg and von Hippel to score genomic sites because scores from this method have been shown to correlate with in vitro binding constants. Many of our matrices show a strong preference for non-coding DNA. We have used results from these matrix searches to make a set of predictions which we are currently verifying experimentally.
Robison, K., McGuire, A. M., Church, G. M. A comprehensive library of DNA-binding site matrices for 55 proteins applied to the complete Escherichia coli K12 genome. Journal of Molecular Biology (1998) 284, 241-254.
Summary Table (Number of hits in the genome for each matrix)
Download the code for the matrix search program scanACE
Sequence logos (Postscript format)
explanation of file formats
sequencing inconsistencies noted
Last updated 11/4/98