Carcinogenesis, progression of normal cells to malignant cancer, derives from hallmark capabilities of cancer driven by acquiring (somatic) mutations in driver genes with a selective advantage for cellular proliferation and potentially metastasis. A major motivation for modern cancer genomics studies is to decipher the genetic architecture of cancer by discovering new driver genes. The most widely-used approaches to predict and prioritize driver genes are based on statistics of mutation frequencies. Several methods have been proposed to identify genes with an excessive number of somatic mutations [9-11], known as significantly mutated genes. I propose to address two major limitations of this approach. First, these methods are insufficiently statistically powered given the amount of sequencing data currently available [15]. I will improve statistical power by leveraging diverse information in cancer genomics currently available into a developed machine learning method. Second, there is little objective clarity about the true effectiveness of these methods [11, 14], since there is no agreed-upon gold standard of driver genes, with the exception of a few well-known drivers. I will develop a framework to compare the effectiveness of driver gene prediction methods, in the absence of a gold standard. Both effectively and efficiently identifying cancer driver genes is a matter of great importance to science funding policy towards cancer genomics.

Public Health Relevance

Large sequencing studies have revolutionized our capability to identify the genetic architecture of cancer. However, effectively integrating this stream of big data to identify specific driver genes has remained troublesome. My proposed research project aims to develop an integrative machine learning method that leverages diverse features in cancer genomics to improve predictions of cancer driver genes, and to utilize a principled approach for evaluating the performance of any such method.

Agency
National Institute of Health (NIH)
Institute
National Cancer Institute (NCI)
Type
Predoctoral Individual National Research Service Award (F31)
Project #
5F31CA200266-03
Application #
9322626
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Radaev, Sergey
Project Start
2015-09-16
Project End
2018-09-15
Budget Start
2017-09-16
Budget End
2018-09-15
Support Year
3
Fiscal Year
2017
Total Cost
Indirect Cost
Name
Johns Hopkins University
Department
Biostatistics & Other Math Sci
Type
Biomed Engr/Col Engr/Engr Sta
DUNS #
001910777
City
Baltimore
State
MD
Country
United States
Zip Code
21205
Reiter, Johannes G; Makohon-Moore, Alvin P; Gerold, Jeffrey M et al. (2018) Minimal functional driver gene heterogeneity among untreated metastases. Science 361:1033-1037
Ng, Patrick Kwok-Shing; Li, Jun; Jeong, Kang Jin et al. (2018) Systematic Functional Annotation of Somatic Mutations in Cancer. Cancer Cell 33:450-462.e10
Cai, Binghuang; Li, Biao; Kiga, Nikki et al. (2017) Matching phenotypes to whole genomes: Lessons learned from four iterations of the personal genome project community challenges. Hum Mutat 38:1266-1276
Tokheim, Collin J; Papadopoulos, Nickolas; Kinzler, Kenneth W et al. (2016) Evaluating the evaluation of cancer driver genes. Proc Natl Acad Sci U S A 113:14330-14335
Tokheim, Collin; Bhattacharya, Rohit; Niknafs, Noushin et al. (2016) Exome-Scale Discovery of Hotspot Mutation Regions in Human Cancer Using 3D Protein Structure. Cancer Res 76:3719-31