Expression Data Set 18 details

Information Item Value
Dataset Name Vel: Log-phase + cell cycle arrested SAGE data
Dataset Number 18
Short Description SAGE expression analysis of yeast transcriptome for log phase yeast cells, G1/S arrested cells, and G2/M arrested cells.
Source URL http://db.yeastgenome.org/cgi-bin/SAGE/querySAGE
Reference Veculescu, Zhang, Zhou, Vogelstein J, Basrai, Bassett, Hieter, Vogelstein B, Kinzler; Cell 1887 (88), pp. 243-251
Strains YPH499 (MATa ura3-52 lys2-801 ade2-101 leu2-delta1 his3-delta200 trp1-delta63)
Conditions Base medium = YPD plus 6mM uracil, 4.8mM adenine, 24 mM tryptophan. Temp = 30. L = early log phase: grow base medium to 3e6 cells/ml. S = G1/S arrest: incubate L cells 0.1M hydroxyurea, 3.5 hr. G2/M arrest: incub. L cells 15micorgr/ml to L cells, 100 min.
Date Added to ExpressDB Feb 23 1999 12:44:24:050PM
Number of Measures on ExpressDB 53 (here to download dataset and view measure details)
Long Description Expression data provided with original reference was given by SAGE tag, with tag locations in S. cerevisiae ORFs given in a separate file. However, the ORF locations were sometimes given via an ORF name and sometimes by a common gene name. To load into ExpressDB the data had to be transformed into a form where it was indexed solely by ORF name. To get the most recent names we remapped tag locations using SGD sequence and annotations downloaded on Feb. 15, 1999 (between 11:45AM and noon).

In accordance with methods given in the original reference, tags were considered located to an ORF when the exact tag sequence was found in the ORF or within 500 bases downstream of the ORF's 3' end. Only tags found to match the coding strand of the ORF or its downstream extension were counted in abundance calculations, and abundances for an ORF considered all such matching tags. Two ORF abundances are computed for each condition: (a) The sums of tag abundances for those tags which are uniquely(!) located within the ORF are presented as "minimum" counts for the ORF. (b) The sums of tag abundances for all tags (unique or not) located to the ORF are presented as "maximum" counts. The rationale for this is that non-unique tags could have been generated from a number of ORFs and it is not possible to determine how to apportion them.

Also in accordance with the original reference, combined abundances are given for all three conditions in addition to the individual condition abundances.

Measures presented here include the following:

- "min counts" for the L, S, G2/M and combined conditions = sum of tag abundances for tags uniquely located to the ORF.

- "max counts" for the L, S, G2/M and combined conditions = sum of tag abundances for all tags located to the ORF.

- "min % identified" for the L, S, G2/M and combined conditions = % of the total sum of tag abundances for all tags that locate to at least one ORF on its coding strand that is represented by the "min count" for the ORF. Note that the sum of "min %" for all ORFs will be less than 100% because many tags do not have unique ORF locations and their abundances are therefore not represented in any "min count" value.

- "min % total" for the L, S, G2/M and combined conditions = % of the total sum of tag abundances for all tags whether located or not to ORFs and coding strands that is represented by the "min count" for the ORF. Assuming all SAGE tags reported represent transcripts, even if not identifiable ones, this may be taken as a lower bound for the expression level of the ORF relative to the total expression activity of the cell. Like "min % identified" the sum of "min % total" over all ORFs will be less than 100%.

- "max % identified" for the L, S, G2/M, and combined conditions = % of total sum of tag abundances for all tags that locate to at least one ORF on its coding strand that is represented by the "max count" for the ORF. Note that the sum of "max %" for all ORFs will be more than 100% because tags that do not locate uniquely to an ORF will be represented more than once among the set of all "max count" values.

- "max % total" for the L, S, G2/M, and combined conditions = % of total sum of tag abundances for all tags whether located or not to ORFs and coding strands that is represented by the "max count" for the ORF. Assuming all SAGE tags represent transcripts, even if not identifiable ones, this may be taken as an upper bound for the expression level of the ORF relative to the total expression activity of the cell. Like "max % identified" the sum of "max % total" over all ORFs cannot be expected to sum to 100%, since tag abundances for tags shared by multiple ORFs will be multiply counted.

- "max-min counts", "max-min % identified", and "max-min % total" = the differences between the "max count" and "min count" values for the ORF for each condition, and similarly, differences between the "max %" and "min %" values ("identified" and "total", respectively). These give the size of the possible range of counts and (in the case of "total %" values) relative expression levels above. When they are 0, the ORF is uniquely represented by all its SAGE tags and its abundance and relative expression levels are exactly given by them. When they are non-zero, uncertainty exists within the ranges indicated. However, the "max-min" counts and % values do not give an estimate of this uncertainty relative to the precisely known "min" value. Therefore:

- "min/max counts" = a measure of the uncertainty of both counts and relative expression levels in relation to the precisely assignable "min count" and "min %" value. The value "max/min" would perhaps be more intuitive, but it is frequently the case that "min counts" are zero and "max counts" are non-zero; this cannot happen the other way around and therefore the reciprocal is given. (When both "max" and "min" have the value 0, "min/max" is assigned the value 1.) Thus, if one wants to find all ORFs for which the maximum possible count or relative expression level is at most 25% more than the minimum, the proper query condition is "min/max >= .8". (This is equivalent to the more intuitive condition "max/min <= 1.25".)

- "min transcripts / cell" and "max transcripts / cell" for each of the L, S, G2/M, and combined conditions = estimates of minimum and maximum transcript abundance based on the estimate used in the original reference that there are 15,000 transcripts / cell for each condition. Numerically this is computed by taking the min (max) counts, dividing by the total number of tags (identified or not) in the condition, and muliplying by 15,000. This is equivalent to applying the min (max) "total %" to 15,000.

- "min tag list", "max tag list" = lists of the sequences of tags used to compute "min" and "max" abundances. The former is therefore the list of tags that are uniquely associable with the ORF, and the latter the list of tags shared by the ORF and other ORFs.

- "min attributed gene list", "max attributed gene list" = the list of ORF and common gene names attributed to the "min tag list" and "max tag list" in the original data set.

- "ORFs sharing tags in max tag list" = the list of names of ORFs sharing tags in the "max tag list" for the ORF which 'owns' the current row of data. The 'owning' ORF is not included in the list of sharing ORFs.

NOTES:

- "ORFs sharing tags in max tag list" and "max tag list" have been truncated to 255 characters for a small number of ORFs to respect ExpressDB data type limits. Truncation is indicated by a final "*".

- Tag matching and abundance counting computations resulted in the following statistics:

#Tags L S G2/M Total
Non genome located 2902 1288 1317 1632 4237
Non ORF located 541 761 631 538 1930
Non coding located 1603 691 1673 831 3195
Not identified (sum of above) 5046 2740 3621 3001 9362
Identified (ORF,coding loc'd) 6346 17444 16413 17414 51271
Total (identified + Not identified) 11392 20184 20034 20415 60633
% identified 55.7 86.4 81.9 85.3 84.6

The abundances in the "Identified" line were used in computing min and max "identified" % values and ranges, while those in the "Total" line were used in computing the "total" % values and ranges.

- This analysis did not take into account the 2 micron plasmid sequence, which was considered by the original reference. There, 91 tags were found to locate to the 2 micron plasmid.

- This analysis did take account the mitochondrial chromosome (as did the original reference).

- No attempt to identify "NORFs" (non-annotated ORFs) was made in this analysis. In the original reference 160 NORFs were identified.

These notes were made by John Aach on Feb. 23, 1999, and revised on August 18, 1999.

Please contact Wayne Rindone for more information, or with any questions, comments, or concerns.

Copyright (c) 2006 by Wayne Rindone and the President and Fellows of Harvard University