Genetic heterogeneity is a common feature of many diseases, with different causal variants, or mutations, present in different individuals with the disease. Genetic heterogeneity complicates the identification of the genetic basis of disease, as any modest sized study will contain individuals with different causal genetic variants. One reason for this heterogeneity is that causal variants are present in groups of genes that interact in various cellular signaling and regulatory pathways. Genetic heterogeneity demands the testing of combinations of variants, rather than individual variants, for association with a disease. However, while individual variants can be tested exhaustively for association, combinations of variants cannot, as there are too many combinations to test, and the number of samples required for statistical significance would be astronomical. We propose to develop new computational and statistical approaches to identify combinations of variants that are associated with a disease. In contrast to existing approaches, we do not restrict attention to known pathways or groups of genes a priori. Rather, our algorithms utilize genome-scale interaction networks and combinational/statistical constraints to identify combinations of variants and rigorously assess their statistical significance. Further, we extend these approaches to find associations between combinations of variants and various clinical parameters such as survival time or response to treatment. We will apply these techniques to cancer genome sequencing projects including The Cancer Genome Atlas (TCGA), in collaboration with several biomedical research groups. Successful completion of the proposed research will facilitate the study of genetically heterogeneous diseases - and in particular cancer - using only a modest number of samples that is attainable with present DNA sequencing technologies.

Public Health Relevance

Identifying the inherited genetic differences associated with a disease and the acquired mutations that lead to cancer are major challenges in medicine. Next-generation DNA sequencing technologies enable measurement of these genetic variants, but interpreting the resulting data demands new computational and statistical approaches. This is particularly true as many diseases are heterogeneous, with many possible genetic causes. We will develop novel algorithms to aid in the discovery of disease-causing genetic variants that will enable better diagnostics and/or personalized treatments for various diseases.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project (R01)
Project #
Application #
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Struewing, Jeffery P
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Brown University
Biostatistics & Other Math Sci
Schools of Arts and Sciences
United States
Zip Code
Nakka, Priyanka; Raphael, Benjamin J; Ramachandran, Sohini (2016) Gene and Network Analysis of Common Variants Reveals Novel Associations in Multiple Complex Diseases. Genetics 204:783-798
Leiserson, Mark D M; Reyna, Matthew A; Raphael, Benjamin J (2016) A weighted exact test for mutually exclusive mutations in cancer. Bioinformatics 32:i736-i745
Raphael, Benjamin J; Vandin, Fabio (2015) Simultaneous inference of cancer pathways and tumor progression from cross-sectional mutation data. J Comput Biol 22:510-27
Vandin, Fabio; Papoutsaki, Alexandra; Raphael, Benjamin J et al. (2015) Accurate computation of survival statistics in genome-wide studies. PLoS Comput Biol 11:e1004071
Leiserson, Mark D M; Gramazio, Connor C; Hu, Jason et al. (2015) MAGI: visualization and collaborative annotation of genomic aberrations. Nat Methods 12:483-4
Lu, Charles; Xie, Mingchao; Wendl, Michael C et al. (2015) Patterns and functional implications of rare germline variants across 12 cancer types. Nat Commun 6:10086
Leiserson, Mark D M; Vandin, Fabio; Wu, Hsin-Ta et al. (2015) Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes. Nat Genet 47:106-14
Mutation Consequences and Pathway Analysis working group of the International Cancer Genome Consortium (2015) Pathway and network analysis of cancer genomes. Nat Methods 12:615-21
Leiserson, Mark D M; Wu, Hsin-Ta; Vandin, Fabio et al. (2015) CoMEt: a statistical approach to identify combinations of mutually exclusive alterations in cancer. Genome Biol 16:160
Hoadley, Katherine A; Yau, Christina; Wolf, Denise M et al. (2014) Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin. Cell 158:929-44

Showing the most recent 10 out of 13 publications