Cancer Susceptibility Variant Discovery in High Throughput Sequencing Data

Ding, Li

Abstract

Large-scale cancer genomics projects such as The Cancer Genome Atlas (TCGA), International Cancer Genome Consortium (ICGC), Therapeutically Applicable Research to Generate Effective Treatments (TARGET), and the Pediatric Cancer Genome Project (PCGP) are producing a wealth of high throughput sequence data from a large number of cancer samples and their matched normals. These data hold great promise for understanding the genetic basis of cancer and also for the identification of germline susceptibility variants in cancer. Major advancements have been made to systematically catalog somatic variations in cancer genomes from these data sets. However, identifying and interpreting germline changes using data from these studies remains a significant challenge. The primary difficulty stems from 1) the lack of computational pipelines/tools to utilize tumor and normal sequencing data for simultaneous detection of somatic, germline, and LOH events at the nucleotide and chromosomal levels and 2) the lack of uniform bioinformatics analysis strategies for identifying and prioritizing deleterious candidate germline variants responsible for susceptibility. We will develop a computational pipeline for the identification and interpretation f germline alterations in cancer including single nucleotide variants, insertions and deletions (indels), copy number variations, and structural variants. This pipeline will be initially used to systematically analyze whole genome, exome, and RNA-sequencing data from over 5,000 cancer cases already generated by several major efforts and individual research groups and additional cases that will be made publicly available in the next several years. In silico predicte deleterious germline variants from these data will be used for statistical association analysis across groups stratified by age and cancer type to identify novel germline susceptibility variants, genes, and pathways involved in different cancer types. We will further investigate the potential interaction between germline susceptibility variants and somatic mutational landscape. Finally, both pipeline and results from this project will be made publically available, facilitating the analysis and interpretation by the research community of the ever- growing large-scale cancer sequencing data to better discover and understand germline susceptibility variants.

Public Health Relevance

The promise of personalized therapy for cancer will only be realized when each individual's germline and tumor genetic code can be read and analyzed in the clinical setting. The software tools and analysis strategies described in this proposal will enable efficient and cost-effective discovery of genetic changes relevant to cancer using publicly available high throughput sequencing data, which will accelerate the overall understanding of genetic information and its application to human health. CRITIQUE 1: Significance: 2 Investigator(s): 2 Innovation: 2 Approach: 2 Environment: 2 Overall Impact: This is a very appropriate project that aims to develop bioinformatics analysis pipelines to detect and characterize the potential impact of variants detected from exome and whole-genome sequencing of cancer samples. The novelty really comes from the careful characterization and analysis of germline and somatic variants using an integrated strategy. The detection will be enhanced by directly taking into account the paired samples and the interpretation will also benefit for considering the functional impact of germline and somatic variants together. This is a very strong group with excellent preliminary data and track record on the topic. The proposal will likely lead to a new generation of very useful and used tools to characterize germline and somatic mutations for paired cancer samples. On the down side, it was a bit unclear how the expression and other functional genomics data would be used. The proposal is also restricted to coding variants with no description of non-coding variants. 1. Significance: Strengths * Adressing the critical problem of detecting and characterizing germline and somatic mutations in cancer. * Makes extremely efficient use of already available data sets. Weaknesses * Doesn't address the potential impact of non-coding variants. Are those simply ignored from all analyses? The proposal would benefit from at least discussing this confounding factor. It is unclear what fraction of the disease relevant variants are expected to be in coding regions. 2. Investigator(s): Strengths * This group has extensive experience with tool development for variant detection and characterization (VarScan, SomaticSniper, Breakdancer, etc.) and so they are highly qualified to generate the next generation of tools that will integrate the germline/somatic calls. * Relatively junior investigator but with the experience of driving large-scale projects Weaknesses * Lots of people involved with very little time committed (<1cal month). Will these people truly contribute? Also, only two persons are at more than 20% on this project and it is not clear if this enough. 3. Innovation: Strengths * Focusing on germiline mutation in cancer at this scale is novel. * Integrated analysis balancing germline and somatic mutations is going to be very important and hasn't been done at this level. Weaknesses * Does correspond to a logical extension of some of the work already underway (e.g. in VarScan 2) 4. Approach: Strengths * The work proposed would extend on some of the work done in VarScan 2. The extension to build VariantSniper is natural and very likely to become a popular tool. The propose work to develop CopyCat to detect copy number change that are germline and somatic aware is also realistic and promising. * Well-defined and complementary aims. Weaknesses * It wasn't clear exactly how RNA-Seq would be used to prioritize candidates especially how the allelic component would be considered. It was also unclear how other functional genomic datasets like methylation will be incorporated into your pipelines/analysis. 5. Environment: Strengths * Very strong team already involve in an array of similar projects

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Cancer Institute (NCI)
Type: Research Project (R01)
Project #: 4R01CA180006-04
Application #: 8997478
Study Section: Special Emphasis Panel (ZGM1)
Program Officer: Mechanic, Leah E

Project Start: 2013-02-01
Project End: 2018-01-31
Budget Start: 2016-02-01
Budget End: 2018-01-31
Support Year: 4
Fiscal Year: 2016
Total Cost
Indirect Cost

Institution

Name: Washington University
Department: Genetics
Type: Schools of Medicine
DUNS #: 068552207

City: Saint Louis
State: MO
Country: United States
Zip Code: 63130

Related projects


NIH 2016 R01 CA	Cancer Susceptibility Variant Discovery in High Throughput Sequencing Data Ding, Li / Washington University
NIH 2015 R01 CA	Cancer Susceptibility Variant Discovery in High Throughput Sequencing Data Ding, Li / Washington University	$263,126
NIH 2014 R01 CA	Cancer Susceptibility Variant Discovery in High Throughput Sequencing Data Ding, Li / Washington University
NIH 2013 R01 CA	Cancer Susceptibility Variant Discovery in High Throughput Sequencing Data Ding, Li / Washington University	$267,623

Publications

Jayasinghe, Reyka G; Cao, Song; Gao, Qingsong et al. (2018) Systematic Analysis of Splice-Site-Creating Mutations in Cancer. Cell Rep 23:270-281.e3

Gao, Qingsong; Liang, Wen-Wei; Foltz, Steven M et al. (2018) Driver Fusions and Their Implications in the Development and Treatment of Human Cancers. Cell Rep 23:227-238.e3

Bailey, Matthew H; Tokheim, Collin; Porta-Pardo, Eduard et al. (2018) Comprehensive Characterization of Cancer Driver Genes and Mutations. Cell 173:371-385.e18

Sengupta, Sohini; Sun, Sam Q; Huang, Kuan-Lin et al. (2018) Integrative omics analyses broaden treatment targets in human cancer. Genome Med 10:60

Huang, Kuan-Lin; Mashl, R Jay; Wu, Yige et al. (2018) Pathogenic Germline Variants in 10,389 Adult Cancers. Cell 173:355-370.e14

Huang, Kuan-Lin; Li, Shunqiang; Mertins, Philipp et al. (2017) Proteogenomic integration reveals therapeutic targets in breast cancer xenografts. Nat Commun 8:14864

Marshall, A D; Bailey, C G; Champ, K et al. (2017) CTCF genetic alterations in endometrial carcinoma are pro-tumorigenic. Oncogene 36:4100-4110

Mashl, R Jay; Scott, Adam D; Huang, Kuan-Lin et al. (2017) GenomeVIP: a cloud platform for genomic variant discovery and interpretation. Genome Res 27:1450-1459

Foltz, Steven M; Liang, Wen-Wei; Xie, Mingchao et al. (2017) MIRMMR: binary classification of microsatellite instability using methylation and mutations. Bioinformatics 33:3799-3801

Wyczalkowski, Matthew A; Wylie, Kristine M; Cao, Song et al. (2017) BreakPoint Surveyor: a pipeline for structural variant visualization. Bioinformatics 33:3121-3122

Showing the most recent 10 out of 39 publications

Comments

Be the first to comment on this grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: