Most variants obtained from tumor whole-genome sequences (WGS) occur in non- coding regions of the genome. Although variants in protein-coding regions have received the majority of attention, numerous studies have now noted the importance of non- coding variants in cancer. Identification of functional non-coding variants that drive tumor growth remains a challenge and a bottleneck for the use of whole-genome sequencing in the clinic. Cancer drivers are generally identified by the high frequency at which their mutations occur across patients. However, mutation rate is highly heterogeneous in non- coding regions and many non-driver elements show higher mutation frequency than others, such as regions bound by transcription factors in melanoma or regions replicating late during cell division in colon cancer. In this proposal, we will use high- throughput pooled CRISPR screen and novel computational methods to predict non- coding cancer drivers. We will quantitatively measure the impact of thousands of non- coding mutations using our innovative high-throughput CRISPR screen that directly ties modifications in the native context of the non-coding genome (i.e. not a reporter assay) to a cancer relevant phenotype (cell growth). The results of the screen will be used as training data for the development of NC_Driver, a computational cancer driver prediction tool. NC_Driver will integrate the signals of high functional impact with the recurrence of variants across multiple tumor samples to identify the non-coding mutations under positive selection in cancer. We will identify drivers in promoters, enhancers and CTCF insulators. CTCF insulators are the most mutated yet least studied regulatory elements in the cancer genome. Using this integrative experimental and computational approach, we will identify high-confidence candidate drivers. Finally, we will perform functional evaluation of prioritized non-coding drivers in colorectal and prostate cancers. We will use CRISPR/Cas9 genome editing in patient-derived cell cultures to test 20 high-ranking candidate driver promoter/enhancer/insulator mutations. Overall, this proposal addresses the critical need to identify drivers in the non-coding genome and over long- term enable the maximal benefit of genome sequencing for each patient.

Public Health Relevance

Cancer genomes contain thousands of mutations but only a few of them play an important role in cancer proliferation and are called drivers. Most of the mutations occur in regions of the genome that do not make proteins, yet the majority of previous studies have focused on protein-coding regions. In this proposal, we will use integrative computational and experimental approaches to identify drivers in the non-protein-coding regions of the genome.

National Institute of Health (NIH)
National Cancer Institute (NCI)
Research Project (R01)
Project #
Application #
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Divi, Rao L
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Weill Medical College of Cornell University
Internal Medicine/Medicine
Schools of Medicine
New York
United States
Zip Code