Structural variants, including duplications, insertions, deletions, inversions, and translocations of large blocks of DNA sequence, have been shown to be associated with various human diseases. These variants also frequently occur as somatic alterations in cancer. Identifying and characterizing structural variants in a genome sequence is a challenging task. We propose to develop computational methods to enable comprehensive studies of structural variation in normal and diseased genomes.
In Aim 1 we develop a general computational framework for classification and comparison of structural variants across multiple samples and measurement platforms using a novel geometric and probabilistic approach.
In Aim 2 we design algorithms to maximize the effectiveness of emerging single-molecule sequencing technologies for detecting and assembling complex structural variants and rearranged transcripts.
In Aim 3 we develop algorithms to reconstruct the organization of cancer genomes and investigate how structural variants alter genome organization during somatic evolution. Finally, in Aim 4, we study the population genetics of inversion polymorphisms in the human genome, including their effects on haplotype block structure and whether inversions under selection leave distinctive genetic signatures. We will apply these approaches to data from human, cancer, mouse, and pathogen genomes in collaboration with several biomedical researchers. Successful completion of the proposed studies will facilitate future research of the role of structural variation in human and cancer genetics.

Public Health Relevance

Identifying the inherited genetic differences associated with disease and the acquired mutations that lead to cancer are major challenges in genomics. One important class of such mutations are structural variants, which include duplications, insertions, deletions, inversions, and translocations of large blocks of DNA sequence. These variants have been implicated in several diseases including autism and cancer. New genome technologies are enabling large-scale measurement of these variants, but demand novel computational methods to maximize the information from these measurements. We will develop a number of algorithms to facilitate the identification and characterization of structural variants. These approaches will aid in the discovery of genetic variants that will provide better diagnostics and/or personalized treatments for various diseases.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project (R01)
Project #
Application #
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Brooks, Lisa
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Brown University
Biostatistics & Other Math Sci
Schools of Arts and Sciences
United States
Zip Code
Kanchi, Krishna L; Johnson, Kimberly J; Lu, Charles et al. (2014) Integrated analysis of germline and somatic variants in ovarian cancer. Nat Commun 5:3156
Hoadley, Katherine A; Yau, Christina; Wolf, Denise M et al. (2014) Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin. Cell 158:929-44
Raphael, Benjamin J; Dobson, Jason R; Oesper, Layla et al. (2014) Identifying driver mutations in sequenced cancer genomes: computational approaches to enable precision medicine. Genome Med 6:5
Wu, Hsin-Ta; Hajirasouliha, Iman; Raphael, Benjamin J (2014) Detecting independent and recurrent copy number aberrations using interval graphs. Bioinformatics 30:i195-203
Ritz, Anna; Bashir, Ali; Sindi, Suzanne et al. (2014) Characterization of structural variants with single molecule and hybrid sequencing approaches. Bioinformatics 30:3458-66
Oesper, Layla; Satas, Gryte; Raphael, Benjamin J (2014) Quantifying tumor heterogeneity in whole-genome and whole-exome sequencing data. Bioinformatics 30:3532-40
Hajirasouliha, Iman; Mahmoody, Ahmad; Raphael, Benjamin J (2014) A combinatorial approach for analyzing intra-tumor heterogeneity from high-throughput sequencing data. Bioinformatics 30:i78-86
Ding, Li; Wendl, Michael C; McMichael, Joshua F et al. (2014) Expanding the computational toolbox for mining cancer genomes. Nat Rev Genet 15:556-70
Batchelor, Eric; Kann, Maricel G; Przytycka, Teresa M et al. (2013) Modeling cell heterogeneity: from single-cell variations to mixed cells. Pac Symp Biocomput :445-50
Oesper, Layla; Mahmoody, Ahmad; Raphael, Benjamin J (2013) THetA: inferring intra-tumor heterogeneity from high-throughput DNA sequencing data. Genome Biol 14:R80

Showing the most recent 10 out of 19 publications