Studies of germline genetic variation in cancer cases and controls as well as studies of somatic mutation have transformed our understanding of cancer etiology and lead to the development of life saving cancer interventions. However, even though tumor progression, evolution, and treatment response are influenced by both somatic and germline variation, these data have largely been examined in isolation. In this work, we propose to integrate extensive data collection, novel statistical methods, and cutting-edge functional validation to discover and characterize somatic-germline interactions in a pan-cancer study. Results from our work will significantly benefit both cancer researcher and multiple medical research discipline more broadly. Within the cancer genetics field, identifying somatic-germline interactions will help (i) identify new classes of drugs targets causally upstream of those identified through somatic driver mutations, (ii) precisely treat patients by selecting interventions the basis of germline and somatic genetics as well as tumor RNA- sequencing, (iii) improve risk profiling, especially for tumor recurrence and outcomes, and (iv) develop hypotheses of the germline risk variants mechanism, especially for non-coding variants. To accomplish these goals, we will leverage tumor sequencing from the DFCI Profile Project together with recent innovations in variant imputation to assemble the largest (N>25,000) pan-cancer germline-somatic cohort to date. We will develop novel statistical and computational methods to maximize the value of these data. Because over 90% of germline genetic variation associated with cancer risk and outcomes is in non- coding regions of the genome we especially focus on integration of functional genomic sequencing from both tumor and normal tissues. Our methods will be capable of modelling proximal germline-somatic interactions as well as distal effects of germline variation on trans and global somatic changes. Furthermore, by focusing largely on RNA-sequencing we investigate a gene-centric model that provides specific hypotheses for mechanism that are readily validated via our experimental follow-up of non-coding variation that is otherwise difficult to interpret.
PROEJCT NARRATIVE The goal of our project is to accelerate the discovery and understanding of how genetic variation in humans and human tumors interact to affect clinically important components of cancer etiology including tumor growth, evolution, and response to therapy. To accomplish this goal we have assembled the world's largest cohort of cancer patients with germline, somatic, and clinical data. Of particular importance is ensuring we develop powerful statistical methods for analyzing these data. Achieving this goal requires expertise across many domains of knowledge including: medical and population genomics, algorithm development, and expertise in clinical cancer databases.