------------------------------------------------------------------------------------- CATSCORE PERL SCRIPTS (c) 2002 Kevin Cheung ------------------------------------------------------------------------------------- DESCRIPTION ------------------------------------------------------------------------------------- This set of perl scripts are described in the paper "A microarray-based antibiotic screen identifies a regulatory role for supercoiling in the osmotic stress response of Escherichia coli," by authors: Kevin J. Cheung, Vasudeo Badarinarayana, Douglas Selinger, Daniel Janse, and George M. Church. Analysis of clustered microarray data can benefit from a systematic approach to the characterization of clusters, and specifically, the elucidation of the commonalities that are shared among coordinately regulated genes. In this way, hypotheses may be developed correlating gene regulation with gene function. CATSCORE implements a test (as described by Tavazoie 1999) for the estimation of the statistical enrichment of functional categories (derived from gene association data) for specific clusters. We have used as reference the genProtEC: Escherichia coli genome and protein database which is maintained on the website for the Marine Biology Laboratory at Woods Hole, MA. This data is a composite of functional classifications compiled by Riley and Ladeban in Niedhart's Escherichia coli and Salmonella: Cellular and Molecular Biology, 2nd Edition, and work by Serres and Riley on paralogous proteins in E.coli. We processed the 8,434 gene classifications and 338 functional categorizations into a format amenable to computational analysis. CATSCORE then performs a hypergeometric test (equivalent to a 1-sided Fisher 2x2 test) for each functional categorization against all clusters. ------------------------------------------------------------------------------------- REQUIREMENTS ------------------------------------------------------------------------------------- These scripts were designed for use with Windows systems running ActivePerl 5.6, but should run on UNIX or Linux as well. GeneCluster can be obtained at: http://www-genome.wi.mit.edu/cancer/software/genecluster2/gc2.html ------------------------------------------------------------------------------------- README ------------------------------------------------------------------------------------- Enclosed are all relevant files needed to analyze clustered output from the Whitehead Institute GeneCluster program (Tamayo 1999). The pipeline is as follows: Table of microarray data || || \/ run GeneCluster (genecluster generates two files based on a prefix, e.g. osr175 will generate osr175_data.txt osr175_centroids.txt) || || \/ at the command line, run "alltests file_data.txt" (e.g. osr175_data.txt) || || \/ output is found in file_data.txt.top5 and file_data.txt.scores