Prashant Mali1,5, Luhan Yang1,3,5, Kevin M. Esvelt2, John Aach1, Marc Guell1, James E. DiCarlo4, Julie E. Norville1, George M. Church1,2,*
1Department of Genetics, Harvard Medical School, Boston, Massachusetts, USA.
2Wyss Institute for Biologically Inspired Engineering, Harvard University, Cambridge, Massachusetts, USA.
3Biological and Biomedical Sciences Program, Harvard Medical School, Boston, MA.
4Department of Biomedical Engineering, Boston University, Boston, MA, USA.
5These authors contributed equally to this work.
*Correspondence should be addressed
This web site provides access to the following supplemental tables and data files described in the article.
Table 2: Incorporation of gRNA targets in Table 1 into a 200bp format suitable
for multiplex DNA array based synthesis.
You may download this as a text file, Excel spreadsheet, or four FASTA files, one for each array: OLS1, OLS2, OLS3, OLS4
Table 3: 12k gRNA targets from
Supplementary Table 1 in a 200bp format synthesized by CustomArray Inc.
You may download this as a text file, Excel spreadsheet, or a FASTA file. Five gRNAs for CDH1 and five for TP53 were successfully retrieved from the 12k oligonucleotide pool using the methodology described in Supplementary Figure 11. The 10 rows corresponding to these gRNAs are highlighted in yellow in the Excel spreadsheet form of the file
See our Methods and Supplementary Methods for details on how these sequences were generated. The following Notes provide additional information.
1. Merged exon regions are contiguous genomic regions that arise from merging overlapping coding sequence exons of one or more RefSeq genes. RefSeq gene exons are denoted by their RefSeq gene accession numbers followed by the expression _exN, where N is an integer representing the number of the exon in the gene. For instance, NM_153366_ex3 is the name given to exon 3 of RefSeq gene NM_153366. As only gRNAs targeting coding sequence were developed, non-coding sequences within exons were eliminated from consideration and only the exon coding portions were included in our analysis. Situations in which exons were truncated in this way are indicated by the presence of a T suffix to the exon name: For instance, from the name NM_033051_ex1T, it can be inferred that exon 1 of NM_033051 contains both coding and non-coding sequence, and that only the coding part was considered in analysis. Finally, merged exon regions are denoted by compound names formed by concatenating the names of the RefSeq gene exons that were merged to form the region, separated by semicolons. For instance, NM_001173425_ex1T;NM_015404_ex1 denotes a merged exon region that comprises the coding part of the first exon of NM_001173425 and the entire first exon of NM_015404. Note that the ends of merged exon regions were extended by small amounts (20bp) of genome sequence to facilitate targeting of the small exonic regions and the edges of exonic regions.
2. While most merged exon regions are likely to be single exons represented in multiple RefSeq-annotated regions, each of which describes a different isoform of a single gene, there may be cases where overlapping exons of different sizes and offsets contribute to a merged region that is larger than the constituent exon regions that were merged together. In such cases, a gRNA target site within the larger merged region may not actually overlap a smaller constituent exon. Therefore, when using these gRNAs to target a particular exon of a particular gene of interest within a merged exon region, you should check to see that the sequence actually overlaps the exon of interest.
3. gRNA target sites within merged exon regions are each named by adding a site number to the name of the merged exon region. For instance, NM_032129_ex13;NM_001160184_ex12:site_12 is the name of an individual gRNA target site in the merged exon region NM_032129_ex13;NM_001160184_ex12. However, the site numbers indicated in these names do not indicate their relative locations among consecutive target sites in a region, but only serve to distinguish the site from other sites in the same region. The site numbers take their origin from the relative locations of the candidate target sites initially identified in each merged exon region, but these were subsequently filtered for specificity (as described in our Methods and Supplemental Methods). As many of these candidates failed the specificity test, the remaining site numbers no longer comprise a consecutive series.
4. Many RefSeq genes are annotated as having multiple locations in the genome, likely a result of duplications. We distinguished the individual instances of the gene by appending a distinct number in parentheses to the RefSeq accession number of the gene. For instance NM_001144767(1) and NM_001144767(2) denote different instances of the gene NM_001144767. In the GRCh37/hg19 genome annotations used in this analysis, 705 RefSeq genes were identified as having instances in multiple locations. Despite the sequence similarity of these locations, 1683 gRNA sites passing the specificity criteria described in our Methods and Supplemental Methods were identified in merged exonic regions containing such multiple-instance genes.
Reagents: All reagents will be available via Addgene