systemmgmt2_directSQL

Supplemental material from "Systematic management and analysis of yeast expression data"

John Aach, Wayne Rindone, and George Church

Department of Genetics, Harvard Medical School, 200 Longwood Ave., Boston, MA, 02115, USA

and

Lipper Center for Computational Functional Genomics, 77 Avenue Louis Pasteur, Boston, MA, 02115, USA

Example of direct submission of Structured Query Language to ExpressDB

ExpressDB can also be accessed in other ways than the EXD application described in the text. First, we developed a tool for bulk batch database extracts that was used in the generation of the estimated relative abundance file. Second, users familiar with Structured Query Language (SQL) can compose custom queries and have them submitted directly to the database. As a case in point, a query was submitted to determine which time point of the seven provided in the Chu_spo set of experiments (see Table 1) is closest to the initial time point, where closeness was measured as the mean square difference (MSD), over all ORFs, of time point to control condition mRNA expression ratios compared to the ratios for the initial time point. The database returned the reasonable result that expression profiles of cells undergoing sporulation get progressively more distant from the initial profile for the early time points (note, however, time point 7): (time in hours, MSD) = (0.5, 0.764), (2, 0.869), (5, 2.988), (9, 4.916), (7, 10.095), (11.5, 13.358). NOTE: This computation did not exclude values called in error.

Please contact John Aach if you are interested in submitting SQL directly to ExpressDB.

SQL submitted to database

use ExpressDB
go
set statistics time on
go
-- experiment to see if one can design SQL that compares two measures
select MeasureName=substring(M2.MeasureName,1,30), 
RMS=avg(power(convert(real,E1.Value-E2.Value),2))
from Measure M1, Measure M2, EDPD E1, EDPD E2
where M1.MeasureName = "R/G:0" 
and M2.MeasureNo != M1.MeasureNo
and M2.MeasureName like "R/G:%"
and E1.MeasureNo = M1.MeasureNo
and E2.MeasureNo = M2.MeasureNo
and E1.ORFNo = E2.ORFNo
and E1.DataPointNo = E2.DataPointNo
group by M2.MeasureName
order by RMS
go

Database answer set

1>; use ExpressDB
1>; set statistics time on
Execution Time 0.
SQL Server cpu time: 0 ms. SQL Server elapsed time: 1094018792 ms.
1> -- experiment to see if one can design SQL that compares two measures
2> select MeasureName=substring(M2.MeasureName,1,30), 
3> RMS=avg(power(convert(real,E1.Value-E2.Value),2))
4> from Measure M1, Measure M2, EDPD E1, EDPD E2
5> where M1.MeasureName = "R/G:0" 
6> and M2.MeasureNo != M1.MeasureNo
7> and M2.MeasureName like "R/G:%"
8> and E1.MeasureNo = M1.MeasureNo
9> and E2.MeasureNo = M2.MeasureNo
10> and E1.ORFNo = E2.ORFNo
11> and E1.DataPointNo = E2.DataPointNo
12> group by M2.MeasureName
13> order by RMS
Parse and Compile Time 0.
SQL Server cpu time: 0 ms.
MeasureName			RMS 
------------------------------	-------------------- 
R/G:0.5 			0.764014 
R/G:2 				0.869452 
R/G:5 				2.987885 
R/G:ndt80-delete-early 		3.662190 
R/G:gal-ndt80 			4.765511 
R/G:9 				4.916045 
R/G:ndt80-delete-mid 		5.363342 
R/G:7 				10.094530 
R/G:11.5 			13.358002 
Execution Time 13920.
SQL Server cpu time: 1392000 ms. SQL Server elapsed time: 2292063 ms.
(9 rows affected)

Performance

ExpressDB took ~ 38.2 minutes to process this query. This reflects the size of the database at the time of submission (~17.5 M "data point" records), as well as other activity on the system at the time. Early in the loading of ExpressDB, the above query took ~ 3 minutes.