Genetic heterogeneity is a common feature of many diseases, with different causal variants, or mutations, present in different individuals with the disease. Genetic heterogeneity complicates the identification of the genetic basis of disease, as any modest sized study will contain individuals with different causal genetic variants. One reason for this heterogeneity is that causal variants are present in groups of genes that interact in various cellular signaling and regulatory pathways. Genetic heterogeneity demands the testing of combinations of variants, rather than individual variants, for association with a disease. However, while individual variants can be tested exhaustively for association, combinations of variants cannot, as there are too many combinations to test, and the number of samples required for statistical significance would be astronomical. We propose to develop new computational and statistical approaches to identify combinations of variants that are associated with a disease. In contrast to existing approaches, we do not restrict attention to known pathways or groups of genes a priori. Rather, our algorithms utilize genome-scale interaction networks and combinational/statistical constraints to identify combinations of variants and rigorously assess their statistical significance. Further, we extend these approaches to find associations between combinations of variants and various clinical parameters such as survival time or response to treatment. We will apply these techniques to cancer genome sequencing projects including The Cancer Genome Atlas (TCGA), in collaboration with several biomedical research groups. Successful completion of the proposed research will facilitate the study of genetically heterogeneous diseases - and in particular cancer - using only a modest number of samples that is attainable with present DNA sequencing technologies.

Public Health Relevance

Identifying the inherited genetic differences associated with a disease and the acquired mutations that lead to cancer are major challenges in medicine. Next-generation DNA sequencing technologies enable measurement of these genetic variants, but interpreting the resulting data demands new computational and statistical approaches. This is particularly true as many diseases are heterogeneous, with many possible genetic causes. We will develop novel algorithms to aid in the discovery of disease-causing genetic variants that will enable better diagnostics and/or personalized treatments for various diseases.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project (R01)
Project #
Application #
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Struewing, Jeffery P
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Brown University
Biostatistics & Other Math Sci
Schools of Arts and Sciences
United States
Zip Code
El-Kebir, Mohammed; Satas, Gryte; Raphael, Benjamin J (2018) Inferring parsimonious migration histories for metastatic cancers. Nat Genet 50:718-726
Oesper, Layla; Dantas, Simone; Raphael, Benjamin J (2017) Identifying simultaneous rearrangements in cancer genomes. Bioinformatics :
Cancer Genome Atlas Research Network. Electronic address:; Cancer Genome Atlas Research Network (2017) Integrated Genomic Characterization of Pancreatic Ductal Adenocarcinoma. Cancer Cell 32:185-203.e13
Nakka, Priyanka; Archer, Natalie P; Xu, Heng et al. (2017) Novel Gene and Network Associations Found for Acute Lymphoblastic Leukemia Using Case-Control and Family-Based Studies in Multiethnic Populations. Cancer Epidemiol Biomarkers Prev 26:1531-1539
Leiserson, Mark D M; Reyna, Matthew A; Raphael, Benjamin J (2016) A weighted exact test for mutually exclusive mutations in cancer. Bioinformatics 32:i736-i745
El-Kebir, Mohammed; Satas, Gryte; Oesper, Layla et al. (2016) Inferring the Mutational History of a Tumor Using Multi-state Perfect Phylogeny Mixtures. Cell Syst 3:43-53
Nakka, Priyanka; Raphael, Benjamin J; Ramachandran, Sohini (2016) Gene and Network Analysis of Common Variants Reveals Novel Associations in Multiple Complex Diseases. Genetics 204:783-798
Leiserson, Mark D M; Vandin, Fabio; Wu, Hsin-Ta et al. (2015) Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes. Nat Genet 47:106-14
Lu, Charles; Xie, Mingchao; Wendl, Michael C et al. (2015) Patterns and functional implications of rare germline variants across 12 cancer types. Nat Commun 6:10086
Leiserson, Mark D M; Wu, Hsin-Ta; Vandin, Fabio et al. (2015) CoMEt: a statistical approach to identify combinations of mutually exclusive alterations in cancer. Genome Biol 16:160

Showing the most recent 10 out of 18 publications