Structural variants, including duplications, insertions, deletions, inversions, and translocations of large blocks of DNA sequence, have been shown to be associated with various human diseases. These variants also frequently occur as somatic alterations in cancer. Identifying and characterizing structural variants in a genome sequence is a challenging task. We propose to develop computational methods to enable comprehensive studies of structural variation in normal and diseased genomes.
In Aim 1 we develop a general computational framework for classification and comparison of structural variants across multiple samples and measurement platforms using a novel geometric and probabilistic approach.
In Aim 2 we design algorithms to maximize the effectiveness of emerging single-molecule sequencing technologies for detecting and assembling complex structural variants and rearranged transcripts.
In Aim 3 we develop algorithms to reconstruct the organization of cancer genomes and investigate how structural variants alter genome organization during somatic evolution. Finally, in Aim 4, we study the population genetics of inversion polymorphisms in the human genome, including their effects on haplotype block structure and whether inversions under selection leave distinctive genetic signatures. We will apply these approaches to data from human, cancer, mouse, and pathogen genomes in collaboration with several biomedical researchers. Successful completion of the proposed studies will facilitate future research of the role of structural variation in human and cancer genetics.
Identifying the inherited genetic differences associated with disease and the acquired mutations that lead to cancer are major challenges in genomics. One important class of such mutations are structural variants, which include duplications, insertions, deletions, inversions, and translocations of large blocks of DNA sequence. These variants have been implicated in several diseases including autism and cancer. New genome technologies are enabling large-scale measurement of these variants, but demand novel computational methods to maximize the information from these measurements. We will develop a number of algorithms to facilitate the identification and characterization of structural variants. These approaches will aid in the discovery of genetic variants that will provide better diagnostics and/or personalized treatments for various diseases.
Parks, Matthew M; Raphael, Benjamin J; Lawrence, Charles E (2018) Using controls to limit false discovery in the era of big data. BMC Bioinformatics 19:323 |
Leiserson, Mark D M; Reyna, Matthew A; Raphael, Benjamin J (2016) A weighted exact test for mutually exclusive mutations in cancer. Bioinformatics 32:i736-i745 |
El-Kebir, Mohammed; Satas, Gryte; Oesper, Layla et al. (2016) Inferring the Mutational History of a Tumor Using Multi-state Perfect Phylogeny Mixtures. Cell Syst 3:43-53 |
Weinreb, Caleb; Raphael, Benjamin J (2016) Identification of hierarchical chromatin domains. Bioinformatics 32:1601-9 |
Leiserson, Mark D M; Gramazio, Connor C; Hu, Jason et al. (2015) MAGI: visualization and collaborative annotation of genomic aberrations. Nat Methods 12:483-4 |
Parks, Matthew M; Lawrence, Charles E; Raphael, Benjamin J (2015) Detecting non-allelic homologous recombination from high-throughput sequencing data. Genome Biol 16:72 |
Doris, Stephen M; Smith, Deborah R; Beamesderfer, Julia N et al. (2015) Universal and domain-specific sequences in 23S-28S ribosomal RNA identified by computational phylogenetics. RNA 21:1719-30 |
Leiserson, Mark D M; Vandin, Fabio; Wu, Hsin-Ta et al. (2015) Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes. Nat Genet 47:106-14 |
Leiserson, Mark D M; Wu, Hsin-Ta; Vandin, Fabio et al. (2015) CoMEt: a statistical approach to identify combinations of mutually exclusive alterations in cancer. Genome Biol 16:160 |
El-Kebir, Mohammed; Oesper, Layla; Acheson-Field, Hannah et al. (2015) Reconstruction of clonal trees and tumor composition from multi-sample sequencing data. Bioinformatics 31:i62-70 |
Showing the most recent 10 out of 40 publications