I. Guide to ORF summary files
There are three ORF summary files provided. These are tab-delimited text files which can easily be imported into Excel©.
LPvsSP_2max_rpt.txt - Log phase versus stationary phase, including genomic DNA data. The full data file in the form loaded into ExpressDB. This is the largest file and is used as an example in the description of data fields below. The second maximal probe pair, or "2max", has been used to represent the abundance of the transcripts. See the paper for a discussion of transcript abundance measures. Negative control probe sets 17-20 were used (see GAPS manual for details).
stat1_cDNA_revcomp_rpt.txt - Reverse complement analysis of first stationary phase chip using a cDNA labeling protocol. The mean of the 4th through 8th brightest probe pairs, or "4-8max", were used to represent transcript abundance. See the "Intensity" data field for these values. Negative control probe sets 1-20 were used (see GAPS© manual for details).
log_RNA_revcomp_rpt.txt - Reverse complement analysis of a log phase sample using the Affymetrix RNA labeling protocol (4-8max). See above for an explanation of 4-8max. Negative control probe sets 1-20 were used (see GAPS© manual for details).
II. Guide to data fields
The LPvsSP_2max_rpt.txt file is used as an example. For other files, the labels "LP" or "SP" will be different, depending on what the experiment has been named, but the fields will be otherwise the same.
Bnumber: Unique predicted RNA identifier as given in: Blattner, F.R., et al. 1997. The complete genome sequence of Escherichia coli K-12. Science 277:1453-74.
Gene: Common gene name.
LP Intensity: Log phase 2max (second maximal probe pair (PM-MM), as described in the paper) averaged from duplicate arrays. In other analyses, different probe pairs may be used to represent the transcript. For example, for the two reverse complement analyses the mean of the 4th through 8th brightest probe pairs, or "4-8max", were used.
LP sdev: Standard deviation of Log Phase 2max.
LP Detect: P = present (detected), A = absent (not detected).
SP Intensity: Stationary phase 2max, averaged from duplicate arrays.
SP sdev: Standard deviation of Stationary Phase 2max.
SP Detect: P = present (detected), A = absent (not detected). Default threshold = 3 standard deviations above negative controls.
SP abs change: Absolute change of 2max between log and stationary phase.
SP FC sig: Stationary phase fold change significance. If the fold change was estimated because the gene was undetected in one condition, this will indicate whether the estimate (found in the next column) is the lower or upper limit. A '>' means it is greater than the fold change estimate, a '<' that it is less than the fold change estimate. The absence of a symbol in this field means the fold change measurement is not an estimation.
SP fold change: Fold change of 2max between log and stationary phase. If the gene is undetected in both this will be scored 'ND' (not determined). If it was detected in only one condition it will be estimated by substituting the mean of the negative controls + 3 strandard deviations for the 2max in the undetected condition. The previous column will indicate when an estimation has been made and its direction.
Cal. FC: Calibrated fold change. A non-linear curve fit to a plot of spiked RNA control data (R = 0.94) has been used to estimate the 'true' fold change based on the measured fold change (the previous column). The equation is: calibrated fold change = 1.2 x (measured fold change)1.9
Note: This measure is not automatically generated by GAPS© but was calculated based on the spiked control data presented in the paper.
SP t-test: Stationary phase t-statistic testing the null hypothesis that the mean 2max's (Given in the LP Intensity and SP Intensity fields) are the same.
SP call: Decision of whether the change in 2max was significant between log and stationary phase. SI = significant increase, SD = significant decrease, I = increase, D = decrease, NC = no change. A change is considered significant if either of the following criteria are fulfilled: i) mean 2max from duplicates is determined to be significantly different in the two conditions by a two tailed Student's t-test with >95% confidence or ii) after discarding the brightest and dimmest probe pairs, at least 11/13 of the remaining probe pairs are all changed in the same direction, by any amount.
gDNA Intensity: 2max of genomic DNA hybridization. Genomic data is from a single chip.
...Same comparisons to log phase (which was used as the baseline) as were made with stationary phase
RNA type: CDS = coding sequence, rRNA = ribosomal RNA, tRNA = transfer RNA, misc_RNA = miscellaouneous RNA.
Length (bases): Length of RNA in bases.
Annotation: Annotation from: Blattner, F.R., et al. 1997. The complete genome sequence of Escherichia coli K-12. Science 277:1453-74. Can be found at: http://www.genome.wisc.edu.
II. Guide to raw .CEL files
These files are provided to allow others to analyze our raw data files using GeneChip©, GAPS©, or software of their own creation. We hope that it becomes common practice to provide the raw data from expression profiling experiments to allow innovative analyses from all sectors of the scientific community. In this spirit, we believe Affymetrix has done a service to the scientific community by making the .CEL files "open source".
Definitions
"Sense" chips: chips designed to hybridize to mRNA (Affymetrix Part No. 900284)
"Antisense" chips: chips designed to hybridize with the reverse complement of mRNA (i.e. cDNA) (Not currently available commercially)
.CEL Files
log1.CEL: Antisense chip of log phase cells, labeled cDNA sample (replicate 1)
log2.CEL: Antisense chip of log phase cells, labeled cDNA sample (replicate 2)
stat1.CEL: Antisense chip of stationary phase cells, labeled cDNA sample (replicate 1)
stat2.CEL: Antisense chip of log phase cells, labeled cDNA sample (replicate 2)
gDNA.CEL: Antisense chip of labeled genomic DNA
stat1_cDNA_revcomp.CEL: Sense chip of stationary phase cells, labeled cDNA sample (from replicate 1). This is a reverse complement analysis in that it is probing the strand opposite of the one the chip was designed to target (discussed in the paper).
log_RNA_revcomp.CEL: Antisense chip of directly labeled RNA, according to Affymetrix's protocol. This is a reverse complement analysis. The RNA sample was from a different LB culture than log1 and log2 above.
normalization_parameters: The background and scaling factor values used in the GAPS© analyses of these files. These values can be used with GAPS© to repeat our analysis and allow comparisons between chips.
III. Technical Notes on Annotations
The following are some observations we, and reviewers of the manuscript, have made:
- Probes on the opposite strand of b0671 (the putative ORF) actually target leuW, causing it to be highly detected in the reverse complement analyses.
- Probes which GeneChip© says target metT actually target leuW. There are no metT probes on the chip (we have BLASTed all the oligos against the E. coli genome), so the absent call in our expression analysis should be disregarded.
- Genes which have identical copies in the genome, such as rRNAs, are not distinguished by separate probes. Thus, measurements reported for one copy are actually measurements of the sum of all copies.