Genomic alterations drive cancer development in pediatric and adult patients, but pediatric cancers display a relative scarcity of somatic variants. This observation coupled with the earlier age of onset in familial cancer syndromes suggests that germline variants contribute to the development of many pediatric cancers. Many known pathogenic germline variants exhibit biological features such as loss of heterozygosity and rarity within the general population; certain germline cancer-causing variants also cause cancer when acquired somatically. Identifying pathogenic germline variants from whole exome sequence (WES) data would clarify the pathogenesis of pediatric cancers, and might support improved treatments for cancer subtypes with poor prognosis such as relapsed acute lymphoblastic leukemia (ALL), high-grade glioma (glioma), and ependymoma. The long-term goal of this research is to understand the role of germline variants in pediatric cancer development using genomic data. The objective of this application is to develop a computational biology tool that identifies pathogenic germline variants from WES data, and to apply this tool to analyze germline variants in relapse ALL, glioma, and ependymoma. This application's central hypothesis is that pathogenic germline variants have biological and cohort-specific features that distinguish them from benign variants, such that a machine learning pipeline trained on these features can predict pathogenic germline variants.
Aim 1 will call germline variants in WES data from 1) pediatric patients with known cancer-causing germline variants and 2) control patients without cancer; patients in data set 1) are randomly divided into a training and test set. A bioinformatics pipeline will call variants, add annotation, and filter out low quality variants.
Aim 2 will use the training set to train and optimize a machine-learning algorithm to predict high confidence germline driver variants.
Aim 3 will apply the final pipeline to validation sets of pediatric relapsed ALL, glioma, and ependymoma samples.
These aims will generate a computational pipeline that predicts pathogenicity of germline variants using a pediatric cancer cohort, enabling improved understanding of the contribution of germline variants to multiple pediatric cancers.

Public Health Relevance

Improved understanding how certain germline DNA variants (those DNA changes that are inherited or occur early in development) promote cancer might lead to better treatments for pediatric cancers. However, identifying which germline variants cause cancer is challenging without genetic information from family members. To address this, this research proposal develops a computational tool that learns from germline variants known to cause cancer and then predicts cancer-causing variants from pediatric cancer patient DNA sequences in the absence of family information; these predicted variants will undergo computational and experimental testing, and may clarify how pediatric cancers like relapsed acute lymphoblastic leukemia, glioma, and ependymoma develop.

National Institute of Health (NIH)
National Cancer Institute (NCI)
Predoctoral Individual National Research Service Award (F31)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Mcneil, Nicole E
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Columbia University (N.Y.)
Internal Medicine/Medicine
Schools of Medicine
New York
United States
Zip Code
Tzoneva, Gannie; Dieck, Chelsea L; Oshima, Koichi et al. (2018) Clonal evolution mechanisms in NT5C2 mutant-relapsed acute lymphoblastic leukaemia. Nature 553:511-514
Madubata, Chioma J; Roshan-Ghias, Alireza; Chu, Timothy et al. (2017) Identification of potentially oncogenic alterations from tumor-only samples reveals Fanconi anemia pathway mutations in bladder carcinomas. NPJ Genom Med 2:29
Taglialatela, Angelo; Alvarez, Silvia; Leuzzi, Giuseppe et al. (2017) Restoration of Replication Fork Stability in BRCA1- and BRCA2-Deficient Cells by Inactivation of SNF2-Family Fork Remodelers. Mol Cell 68:414-430.e8