|Dataset Name||Normalized ERA Dataset|
|Short Description||This data set contains estimated relative abundances (ERAs) computed from yeast gene expression level data from 9 references as described in an article we published in Genome Research.|
|Reference||Aach, J., Rindone, W., Church, G.M. (2000) Systematic management and analysis of yeast gene expression data . Genome Research 10: 431-445.|
|Date Added to ExpressDB||Oct 20 1999 3:37:38:646PM|
|Number of Measures on ExpressDB||214 (here to download dataset and view measure details)|
|Long Description||This data set contains estimated relative abundances (ERAs) computed from yeast gene expression level data from 9 references. Procedures for computing ERAs depended on the method by which the original data were collected and are detailed in an article that has been submitted to Genome Research. The following is a brief summary of methods; for rationales and details please consult the article.
1. ORF names in the original data set were assigned to Saccharomyces Genome Database (SGD) ORF names by matching them against a historical archive of SGD ORF 'Location' and 'Other Feature' files maintained on another database (BIGED). Some matching of hyphenation variants was also performed.
2. Multiple expression level values are presented in the original data for some ORFs in every condition due to the presence of multiple probe sets for the ORF in the Affymetrix chip set. For all ORFs, a single consolidated value was computed by taking the average of all PM-MM values for the ORF coming from those probe sets that come first in this list: (1) Probe sets with oligos that simply specify the gene name and are unqualified by exon locations and Affymetrix feature codes (esp. _r, _f, and _i). (2) Probe sets with oligos from exon 2 that are otherwise unqualified. (3) Probe sets with oligos from exon 1 that are otherwise unqualified. (4) Any other probe set.
3. To eliminate negative values, all values below a certain threshold were set to that threshold. The threshold used was the 5th percentile of all values where this was positive, and the 5th percentile of positive values otherwise.
4. ERAs for each ORF were computed by dividing each ORFs value as computed up through step 3 by the total for all ORFs of these values (controls feature values excluded).
NOTE: Microarray ERAs exhibit increased variability compared to microarray ratios because they cannot be computed from ratios but only estimated from background-subtracted spot intensities. We estimate that ERAs are ~ 3.3 times more variable than comparable microarray ratios (as measured by the ratios of coefficients of variation of spot ERAs and ratios for instances of ORFs with multiple spots in a large sample). We emphasize the preliminary nature of these values and advise caution in interpreting them. For details on this issue and possible ways of improving the computation of microarray ERAs, refer to our article and its supplementary material.
1. ORF names were standardized as per Affymetrix 1.
2. Multiple background-subtracted spot intensity values for an ORF in a condition were consolidated into a single value by (a) excluding any value that failed data set quality tests and (b) taking the maximum of the remaining values (see the article supplementary material for details).
3. For series of experiments in which a single control condition was used to obtain ratios for several experimental condition, the multiple control condition data series were reduced to a single series by taking the median value for each ORF, and each experimental value for the ORF was adjusted based on this median and the control value obtained on the same array as the experimental value.
4. ERAs were computed as per Affymetrix 4.
SAGE (Serial Analysis of Gene Expression)
1. For the single SAGE data set included in this analysis, values for those ORFs whose 'minimum' tag count = 'maximum' tag counts were extracted. This is a reduced set of ORFs whose tag counts are entirely unambiguous. Go here for further details.
2. ERAs were computed by dividing the tag counts for these ORFs by the total number of tags counted provided by the original reference.
MEASURES PROVIDED IN THE ERA DATA SET
With one exception, all Measures provide ERAs calculated as described above for each condition in the 9 references for which they were computed. The original source data set and condition can be determined from the name of the Measure. Each Measure name begins with a series code that matches a data set name in the ExpressDB yeast datasets. The rest of the Measure name indicates the condition in this data set to which the computed ERAs correspond. For instance Der_diaux_exp_21 is the 21 hour time point from the series of experiments whose series code is Der_diaux (a description of which can be found by following the hotlink above).
Note that for ERAs derived from microarray-based experiments, control conditions often represent the same condition as one of the experimental conditions. Since ERAs for microarray-based experiments are computed based on intensity series and not ratio series in this data set (see above), this means that two ERA series may appear for the same condition. Thus, for instance, both Chu_spo_exp_0 and Chu_spo_cont both represent the RNA sample taken for the initial time point of the Chu_spo series. However, the latter is derived from the sample labeled for use as a control sample and measured multiple times (once for each array used), while the former is derived from the same sample labeled as the experimental sample for the initial time point and measured only on one array.
The one Measure on the data set that does not represent an ERA measurement is NameStatus. It describes how different ORF names used in the original source data sets may have been consolidated into the single standard ORF name under which ERAs are reported.
Please contact Wayne Rindone for more information, or with any questions, comments, or concerns.
Copyright (c) 2006 by Wayne Rindone and the President and Fellows of Harvard University