Computational Discovery Of Sense-Antisense Transcription in the Human & Mouse Genomes1

 

Experimental evidence suggests potential functionality for long perfect duplexes (e.g. sense-antisense dsRNA) in mammalian gene regulation at a surprising variety of levels, including genomic imprinting2,3, RNA interference4, translational regulation5, alternative splicing6, X-inactivation7, and RNA editing8.We have mined publicly available ESTs in mouse and human UniGene clusters for transcripts that share a region of genomic origin with a distinct RNA species that is oriented in the opposite direction.  Lipman9 suggested the possibility that many of the stretches of evolutionary conservation in 5’ and 3’ untranslated regions of mammalian genes may be explained by functionally relevant overlapping, oppositely oriented transcripts.

 

The primary goal of this website is to provide graphical representations of evidence for the candidate pairings of sense-antisense RNA species that were presumably co-clustered in the NCBI UniGene resource as a consequence of a region of overlap.  Our data is in the form of mappings of the exon-intron structures of ESTs and mRNAs along genomic coordinates.  We applied the MEGABLAST and SIM4 tools11,12 to map EST & mRNA sequences from NCBI UniGene clusters to the NCBI assembly of the human genome and the Celera assembly of the mouse genome.  The data is complemented by HUMMUS, a set of 1.15 million alignments of sequences highly conserved between the mouse and human genomes.  This page provides a small set of examples with more extensive documentation and commentary.  A full set of similar graphical representations is available for all 144 human candidates and all 74 mouse candidates.

 

We hypothesized that ESTs derived from overlapping transcripts would be inadvertently co-clustered in UniGene.  We calculated directional-cloning library quality scores (LQS), and focused on ESTs derived from libraries with an LQS of greater than 0.95 (see text for methods).   Link here for a full list of libraries and calculated LQS scores.  Our focus is on UniGene clusters containing statistically significant numbers of mis-oriented ESTs.  We mapped relevant ESTs and mRNAs from these clusters back to genomic sequence, and selected candidate clusters that appeared to contain two distinct RNA species, oppositely oriented and overlapping. 

 

Each UniGene cluster is assigned a “best-of-UniGene” representative (”BOU”), the longest and highest quality sequence present in the cluster.  We present here the genomic alignments of the BOU and sense and antisense-oriented ESTs from directionally-cloned libraries.  We also incorporated information on human-mouse homology into these alignments (“HUMMUS”).  We present on this page only a few interesting or representative candidates with some commentary.  Similar genomic maps are available for all human and all mouse candidates.

 

Sample Candidates with Brief Commentary

 

Graphical representations for all candidates are available via links on this page.  However, we selected a subset of interesting or representative examples to display on this page with some brief commentary.  Link on the UniGene cluster name to pull up the regular candidate report.

 

*  Hs.288835

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


CIDEB: cell-death-inducing DFFA-like effector B.  The upper graph depicts the exon-intron mappings of the Best-Of-UniGene representative (BOU) and ESTs belonging to UniGene cluster Hs.288835 that were derived from directionally-cloned EST libraries.  This was an interesting example of potential upregulation of antisense ESTs to a tumor suppressor in cancer-derived tissues.  The X-axis of the map reflects genomic coordinates along NCBI contig Hs14_19739_24.  Each Y-axis position is assigned to a single EST or mRNA sequence.  The graph can be viewed in greater resolution by directly linking to the candidate page.  A legend is provided in the links-frame to the left of this frame.  In this case, the BOU mRNA (in green and blue) is oriented from left-to-right with respect to the genomic contig.  Immediately  below the mRNA mapping, we have indicated regions of the genome that are indicated to be highly conserved in HUMMUS, a set of 1.15 million “islands” of mouse-human conservation (in gold).  As expected, there is a strong correlation between the locations of the coding exons in the BOU mRNA sequence and the location of HUMMUS-defined islands of conservations.  Exons of sense ESTs are depicted in yellow and exons of antisense ESTs are depicted in white.  With no exceptions, the sense-oriented ESTs match the splice pattern of the BOU mRNA sequence.  The antisense ESTs are clearly not spliced in the same manner as sense ESTs or the BOU mRNA, suggesting that they are derived from an oppositely oriented transcript (presumably unspliced, at least in the region that we are observing). 

 

We checked the annotated tissue origin of these ESTs, and found that a significantly greater fraction of the antisense ESTs (34/46) than the sense ESTs (3/15) were derived from neoplastic tissues (p = ~0.0001 by chi-squared statistic).  As the sense transcript codes for a pro-apoptotic gene, the result immediately suggests the interesting hypothesis that upregulation of the antisense RNA species in cancer tissues has functional relevance with respect to suppression of this sense transcript.

 

 

*  Mm.10022

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


Homer3-pending Homer, neuronal immediate early gene, 3.  The graph depicts the exon-intron mappings of the Best-Of-UniGene representative (BOU) and ESTs belonging to UniGene cluster Mm.10022 that were derived from directionally-cloned EST libraries.  This was a relatively straightforward example of two oppositely-oriented, clearly overlapping transcripts that both spliced and that both appear to code for distinct proteins.  It also noteable as a clear example of strong coincidence of the 3’ UTR islands of conservation and the exon splicing structure of the oppositely-oriented transcript.  The X-axis of the map reflects genomic coordinates along Celera contig GA_x5J8B7W3T6H.  Each Y-axis position is assigned to a single EST or mRNA sequence.  The graph can be viewed in greater resolution by directly linking to the candidate page.  A legend is provided in the links-frame to the left of this frame.  In this case, the BOU mRNA (in green and blue) is oriented from left-to-right with respect to the genomic contig.  Immediately  below the mRNA mapping, we have indicated regions of the genome that are indicated to be highly conserved in HUMMUS, a set of 1.15 million “islands” of mouse-human conservation (in gold).  Note the strong coincidence between sense coding exons in the mRNA and spikes of conservation in HUMMUS.  The N-terminal regions of the protein appear to be less conserved than the C-terminal regions.  A similar coincidence is noted for the exons of the putative antisense transcript, which bear strong protein homology to a human ATP-dependent RNA helicase.  The antisense exons provide a potential explanation for the streches of strong conservation in the 3’ UTR of the sense transcript. 

 

*  Mm.148209

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


Synaptonemal complex protein 3.  The graph depicts the exon-intron mappings of the Best-Of-UniGene representative (BOU) and ESTs belonging to UniGene cluster Mm.148209 that were derived from directionally-cloned EST libraries.  This is an interesting example where the antisense species appears to be non-coding and exclusively overlaps a single internal coding exon of the sense transcript.  The X-axis of the map reflects genomic coordinates along Ceelra contig GA_x5J8B7W5VG6.  Each Y-axis position is assigned to a single EST or mRNA sequence.  The graph can be viewed in greater resolution by directly linking to the candidate page.  A legend is provided in the links-frame to the left of this frame.  In this case, the BOU mRNA (in green and blue) is oriented from left-to-right with respect to the genomic contig.  Immediately  below the mRNA mapping, we have indicated regions of the genome that are indicated to be highly conserved in HUMMUS, a set of 1.15 million “islands” of mouse-human conservation (in gold).  As expected, there is a strong correlation between the locations of the coding exons in the BOU mRNA sequence and the location of HUMMUS-defined islands of conservations.  It is difficult to know what the functional relevance of the putative antisense species might be.  It is worth noting however, that the antisense ESTs are derived from multiple independent libraries, and that there does appear to be an appropriately located polyA signal, such that we are observing the 3’ end of a larger transcript.  If so, it is possible that other regions of this transcript contain an ORF. 

 

 

*  Hs.25197

 



 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

STIP1 homology and U-Box containing protein 1.  The graph depicts the exon-intron mappings of the Best-Of-UniGene representative (BOU) and ESTs belonging to UniGene cluster Hs.25197 that were derived from directionally-cloned EST libraries.  This is a nice example of two relativley abundant transcripts with a fairly substantial region of overlap.  The X-axis of the map reflects genomic coordinates along NCBI contig Hs16_10709_24.  Each Y-axis position is assigned to a single EST or mRNA sequence.  The graph can be viewed in greater resolution by directly linking to the candidate page.  A legend is provided in the links-frame to the left of this frame.  In this case, the BOU mRNA (in green and blue) is oriented from right-to-left with respect to the genomic contig.  Immediately  below the mRNA mapping, we have indicated regions of the genome that are indicated to be highly conserved in HUMMUS, a set of 1.15 million “islands” of mouse-human conservation (in gold).  The antisense transcript is not annotated, but matched the protein product of a GenScan prediction over this region and has some homology to a C. elegans predicted protein, suggesting that it contains a functional ORF.  The ORFs themselves are not overlapping.  The overlap primarily consists of the 3’ UTR of the antisense transcript with the 3’ UTR and several coding exons of the sense transcript (to reiterate, the sense-transcript is oriented from right-to-left in this graph).  Note that the 3’ UTR of the sense transcript (fully overlapped by that of the antisense transcript) is highly conserved. 

 

 

*  Hs.113916

 

 

 

 

 

 

 

 

 

 

 


                                                                                                                                                                                

 

 

 

 

 

 

 

 

 

 

 

Burkitt lymphoma receptor 1, GTP-binding protein.  The graph depicts the exon-intron mappings of the Best-Of-UniGene representative (BOU) and ESTs belonging to UniGene cluster Hs.113916 that were derived from directionally-cloned EST libraries.  This was an interesting example of potential upregulation of antisense ESTs to a tumor suppressor in cancer-derived tissues.  The X-axis of the map reflects genomic coordinates along NCBI contig Hs11_9491_24.  Each Y-axis position is assigned to a single EST or mRNA sequence.  The graph can be viewed in greater resolution by directly linking to the candidate page.  A legend is provided in the links-frame to the left of this frame.  In this case, the BOU mRNA (in green and blue) is oriented from left-to-right with respect to the genomic contig.  Immediately  below the mRNA mapping, we have indicated regions of the genome that are indicated to be highly conserved in HUMMUS, a set of 1.15 million “islands” of mouse-human conservation (in gold).  The sense transcript is unspliced, but there is clearly a coincidence between the sense coding region (in blue) and mouse-human conservation.  The antisense ESTs intersect with the most 3’ portion of the 3’ UTR of the sense transcript.  They are contain appropriately located polyadenylation signals, such that we are probably observing the 3’ tail of the antisense transcript.  The antisense ESTs have no significant protein homologies.  It is worth noting that although there are several islands of conservation in the 3’ UTR of the sense transcript, the most conserved stretch in the 3’ UTR is coincident with the overlap of the antisense ESTs.

 

 

*  Hs.125819

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


Putative dimethyladenosine transferase.  The graph depicts the exon-intron mappings of the Best-Of-UniGene representative (BOU) and ESTs belonging to UniGene cluster Hs.125819 that were derived from directionally-cloned EST libraries.  In this example, there appear to be two distinct antisense transcripts overlapping the sense transcript (alternatively, two variants of a single antisense transcript).  The X-axis of the map reflects genomic coordinates along NCBI contig Hs5_6844_24.  Each Y-axis position is assigned to a single EST or mRNA sequence.  The graph can be viewed in greater resolution by directly linking to the candidate page.  A legend is provided in the links-frame to the left of this frame.  In this case, the BOU mRNA (in green and blue) is oriented from right-to-left with respect to the genomic contig.  The graph is thus zoomed in on 3’ coding exons and the 3’ UTR of the sense transcript.  Immediately  below the mRNA mapping, we have indicated regions of the genome that are indicated to be highly conserved in HUMMUS, a set of 1.15 million “islands” of mouse-human conservation (in gold).  As expected, there is a strong correlation between the locations of the coding exons in the BOU mRNA sequence and the location of HUMMUS-defined islands of conservations.  The antisense ESTs have no siginificant protein homologies, and do not appear to be spliced.  Noteably, however, there appear to be two potential termini for the antisense ESTs (which are oriented left-to-right).  One terminus is coincident with an island of conservation within the 3’ UTR of the sense transcript.  The second terminus is coincident with the last internal coding exons of the sense transcript.  In both cases, the sequence near the putative termini contains an appropriately located polyadenylation signal. 

 

                                                                                                                                               

*  Hs.47313

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


KIAA0258.  The graph depicts the exon-intron mappings of the Best-Of-UniGene representative (BOU) and ESTs belonging to UniGene cluster Hs.47313 that were derived from directionally-cloned EST libraries.  We present this as an example of how an antisense transcript can provide a potential explanation for conserved stretches in the 3’ UTR of a transcript.  .  The X-axis of the map reflects genomic coordinates along NCBI contig Hs9_28427_24.  Each Y-axis position is assigned to a single EST or mRNA sequence.  The graph can be viewed in greater resolution by directly linking to the candidate page.  A legend is provided in the links-frame to the left of this frame.  In this case, the BOU mRNA (in green and blue) is oriented from left-to-right with respect to the genomic contig.  Immediately  below the mRNA mapping, we have indicated regions of the genome that are indicated to be highly conserved in HUMMUS, a set of 1.15 million “islands” of mouse-human conservation (in gold).  As expected, there is a strong correlation between the locations of the coding exons in the BOU mRNA sequence and the location of HUMMUS-defined islands of conservations.  Exons of sense ESTs are depicted in yellow and exons of antisense ESTs are depicted in white.  The height of individual bars in this row is derived from the % identity over a 50 base-pair window centered on each base-pair.  As expected, there is a strong correlation between the locations of coding exons in the BOU mRNA sequence and the location of HUMMUS-defined islands of conservation.  What is evident from this graph is that the exon-intron splicing pattern of the antisense ESTs is clearly distinct from those of the BOU mRNA and sense ESTs.  This strengthens the claim that these represent distinct RNA species inadvertently co-clustered into a single UniGene cluster by virtue of an overlap.  Also striking is the observation that the islands of conservation in the 3’ UTR of the BOU mRNA are largely coincident with the splicing pattern of the putative antisense transcript, providing at least a potential basis for the conserved elements observed in the 3’ UTR of this mRNA.  In this case, the putative antisense mRNA species does have strong homology to a known protein, suggesting that it is a coding mRNA.

 

Related Links      

 

HUMMUS: Whole Genome analysis of Mouse-Human Conservation

 

List of Directionally Cloned Human & Mouse NCBI UniGene Libraries with a Library Quality Score of Greater than 0.95

 

Supplementary File 1

 

Supplementary Information for Lehrer et al. (2002)

 

NCBI UniGene

 

Celera

 

SIM4

 

Church Lab

 

Related References

 

  1. Shendure, J. and Church, GM.  Computational Discovery Of Sense-Antisense Transcription in the Human & Mouse Genomes.  (submitted)

 

  1. Moore T. et al. Proc Natl Acad Sci U S A 94, 12509-14 (1997).

 

  1. Sleutels F., Zwart R., Barlow D.P. Nature 415, 810-3 (2002).

 

  1. Billy E., Brondani V., Zhang H., Muller U., Filipowicz W.  Proc Natl Acad Sci U S A 98, 14428-33 (2001).

 

  1. Li A.W., Murphy P.R. Mol Cell Endocrinol 170, 233-42 (2000).

 

  1. Munroe S.H., Lazar M.A. J Biol Chem 266, 22083-6 (1991).

 

  1. Lee J.T., Davidow L.S., Warshawsky D. Nat Genet 21, 400-4 (1999).

 

  1. Kumar M, Carmichael GG. Proc Natl Acad Sci U S A 94, 3542-7 (1997).

 

  1. Lipman D.J. Nucleic Acids Res 25, 3580-3 (1997)

 

  1. Lehner B., Williams G., Campbell R.D., Sanderson C.M.  Trends Genet 18, 63-5 (2002).

 

  1. Florea L., Hartzell G., Zhang Z., Rubin GM., Miller W. Genome Res 8, 967-74 (1998).

 

  1. Zhang Z., Schwartz, S., Wagner, L., Miller, W.A. J Comput Biol 7, 203-14 (2000).

 

  1. Altschul S.F., Gish W., Miller W., Myers E.W., Lipman D.J. J Mol Biol 215, 403-10 (1990).

 

Contact Information

 

Jay Shendure         

 

jay_shendure@student.hms.harvard.edu

 

Church Lab

Department of Genetics

Harvard Medical School

 

Last revised : May 2, 2002