Carcinogenesis, progression of normal cells to malignant cancer, derives from hallmark capabilities of cancer driven by acquiring (somatic) mutations in driver genes with a selective advantage for cellular proliferation and potentially metastasis. A major motivation for modern cancer genomics studies is to decipher the genetic architecture of cancer by discovering new driver genes. The most widely-used approaches to predict and prioritize driver genes are based on statistics of mutation frequencies. Several methods have been proposed to identify genes with an excessive number of somatic mutations [9-11], known as significantly mutated genes. I propose to address two major limitations of this approach. First, these methods are insufficiently statistically powered given the amount of sequencing data currently available . I will improve statistical power by leveraging diverse information in cancer genomics currently available into a developed machine learning method. Second, there is little objective clarity about the true effectiveness of these methods [11, 14], since there is no agreed-upon gold standard of driver genes, with the exception of a few well-known drivers. I will develop a framework to compare the effectiveness of driver gene prediction methods, in the absence of a gold standard. Both effectively and efficiently identifying cancer driver genes is a matter of great importance to science funding policy towards cancer genomics.
Large sequencing studies have revolutionized our capability to identify the genetic architecture of cancer. However, effectively integrating this stream of big data to identify specific driver genes has remained troublesome. My proposed research project aims to develop an integrative machine learning method that leverages diverse features in cancer genomics to improve predictions of cancer driver genes, and to utilize a principled approach for evaluating the performance of any such method.
|Reiter, Johannes G; Makohon-Moore, Alvin P; Gerold, Jeffrey M et al. (2018) Minimal functional driver gene heterogeneity among untreated metastases. Science 361:1033-1037|
|Ng, Patrick Kwok-Shing; Li, Jun; Jeong, Kang Jin et al. (2018) Systematic Functional Annotation of Somatic Mutations in Cancer. Cancer Cell 33:450-462.e10|
|Cai, Binghuang; Li, Biao; Kiga, Nikki et al. (2017) Matching phenotypes to whole genomes: Lessons learned from four iterations of the personal genome project community challenges. Hum Mutat 38:1266-1276|
|Tokheim, Collin J; Papadopoulos, Nickolas; Kinzler, Kenneth W et al. (2016) Evaluating the evaluation of cancer driver genes. Proc Natl Acad Sci U S A 113:14330-14335|
|Tokheim, Collin; Bhattacharya, Rohit; Niknafs, Noushin et al. (2016) Exome-Scale Discovery of Hotspot Mutation Regions in Human Cancer Using 3D Protein Structure. Cancer Res 76:3719-31|