Genetic heterogeneity is a common feature of many diseases, with different causal variants, or mutations, present in different individuals with the disease. Genetic heterogeneity complicates the identification of the genetic basis of disease, as any modest sized study will contain individuals with different causal genetic variants. One reason for this heterogeneity is that causal variants are present in groups of genes that interact in various cellular signaling and regulatory pathways. Genetic heterogeneity demands the testing of combinations of variants, rather than individual variants, for association with a disease. However, while individual variants can be tested exhaustively for association, combinations of variants cannot, as there are too many combinations to test, and the number of samples required for statistical significance would be astronomical. We propose to develop new computational and statistical approaches to identify combinations of variants that are associated with a disease. In contrast to existing approaches, we do not restrict attention to known pathways or groups of genes a priori. Rather, our algorithms utilize genome-scale interaction networks and combinational/statistical constraints to identify combinations of variants and rigorously assess their statistical significance. Further, we extend these approaches to find associations between combinations of variants and various clinical parameters such as survival time or response to treatment. We will apply these techniques to cancer genome sequencing projects including The Cancer Genome Atlas (TCGA), in collaboration with several biomedical research groups. Successful completion of the proposed research will facilitate the study of genetically heterogeneous diseases - and in particular cancer - using only a modest number of samples that is attainable with present DNA sequencing technologies.

Public Health Relevance

Identifying the inherited genetic differences associated with a disease and the acquired mutations that lead to cancer are major challenges in medicine. Next-generation DNA sequencing technologies enable measurement of these genetic variants, but interpreting the resulting data demands new computational and statistical approaches. This is particularly true as many diseases are heterogeneous, with many possible genetic causes. We will develop novel algorithms to aid in the discovery of disease-causing genetic variants that will enable better diagnostics and/or personalized treatments for various diseases.

National Institute of Health (NIH)
Research Project (R01)
Project #
Application #
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Struewing, Jeffery P
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Brown University
Biostatistics & Other Math Sci
Schools of Arts and Sciences
United States
Zip Code
Hoadley, Katherine A; Yau, Christina; Wolf, Denise M et al. (2014) Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin. Cell 158:929-44
Ding, Li; Wendl, Michael C; McMichael, Joshua F et al. (2014) Expanding the computational toolbox for mining cancer genomes. Nat Rev Genet 15:556-70
Brodsky, Alexander S; Fischer, Andrew; Miller, Daniel H et al. (2014) Expression profiling of primary and metastatic ovarian tumors reveals differences indicative of aggressive disease. PLoS One 9:e94476