Cancer genomics projects have successfully cataloged many of the frequent genomic, epigenetic, and gene expression alterations that drive cancer progression. However, these initial projects have also demonstrated that both the types and targets of genomic aberrations are incredibly heterogeneous, reflecting the large diversity of perturbations in the cellular machinery that promote tumor growth and metastasis. As cancer sequencing efforts expand to determine the molecular basis of additional phenotypes such as drug resistance or exceptional responders, novel methods to integrate data from multiple genomic characterization platforms across combinations of alterations in pathways and interaction networks are essential. We propose to build a Genome Data Analysis Center (GDAC) focused on pathway analysis. Our GDAC will integrate data from multiple genome characterization platforms, and use several computational approaches to identify combinations of genomic aberrations and downstream expression changes that distinguish clinical phenotypes. We will employ algorithms that utilize information about known pathways and/or biological interaction networks, as well as other approaches that analyze statistical patterns of mutual exclusivity and co-occurrence between alterations and clinical variables. We will combine the discovered pathways with knowledge of drugs and their targets to identify novel interventions in individual patients. Finally, we will augment the computational analyses with a web platform for interactive visualization and annotation of discovered pathways. This human-in-the-loop system will accelerate the annotation of mutations, pathways, and interventions and provide a dynamic ecosystem linking cancer genomics datasets to new and existing literature. By combining rigorous computational and statistical approaches with human-in-the-loop annotation, the proposed GDAC will facilitate the translation of multi-platform genome characterization data to clinical application.

Public Health Relevance

We will create a Genome Data Analysis Center (GDAC) to analyze data from cancer genome sequencing projects. Our center will develop and apply novel computational approaches to identify biological pathways that are altered in cancer, suggesting new approaches for cancer diagnosis and treatment.

National Institute of Health (NIH)
National Cancer Institute (NCI)
Resource-Related Research Projects--Cooperative Agreements (U24)
Project #
Application #
Study Section
Special Emphasis Panel (ZCA1)
Program Officer
Yang, Liming
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Princeton University
Biostatistics & Other Math Sci
Biomed Engr/Col Engr/Engr Sta
United States
Zip Code
Radovich, Milan; Pickering, Curtis R; Felau, Ina et al. (2018) The Integrated Genomic Landscape of Thymic Epithelial Tumors. Cancer Cell 33:244-258.e10
El-Kebir, Mohammed; Satas, Gryte; Raphael, Benjamin J (2018) Inferring parsimonious migration histories for metastatic cancers. Nat Genet 50:718-726
Huang, Kuan-Lin; Mashl, R Jay; Wu, Yige et al. (2018) Pathogenic Germline Variants in 10,389 Adult Cancers. Cell 173:355-370.e14
Cancer Genome Atlas Research Network. Electronic address:; Cancer Genome Atlas Research Network (2017) Integrated Genomic Characterization of Pancreatic Ductal Adenocarcinoma. Cancer Cell 32:185-203.e13
Cancer Genome Atlas Research Network; Analysis Working Group: Asan University; BC Cancer Agency et al. (2017) Integrated genomic characterization of oesophageal carcinoma. Nature 541:169-175
Oesper, Layla; Dantas, Simone; Raphael, Benjamin J (2017) Identifying simultaneous rearrangements in cancer genomes. Bioinformatics :