The whole genome sequencing of large cohorts of individuals is quickly becoming a common tool for researchers to investigate the genetic basis of many disease phenotypes. The primary goals are to discover the underlying genetic variation that cause or contribute to these diseases as well as to correctly identify these variants in a diagnostic setting. These differences typicall consist of single base changes (SNPs), but can also encompass larger, more complex chromosomal rearrangements in the form of structural variation (SV) which are much more difficult to detect even with modern sequencing technologies. A number of approaches have been published that have studied this problem, but even the largest scale endeavors have only focused on deletion events and reported a sensitivity of <70%. Complex chromosomal rearrangements are even less well studied. Thus, it is paramount that accurate methods are developed which can detect all types of SVs at high specificity from sequence data. This proposal aims to improve the overall ability of researchers to identify and analyze genetic variation from whole genome sequences. An important, and often overlooked, aspect of SV discovery is the fact that typical paired-end, read depth, and split read approaches will identify different sets of non-overlapping variants at varying degrees of accuracy.
In Aim 1, we will develop a unified SV discovery algorithm that can incorporate all of these different sources of information in a probabilistic fashion. Such a method would be useful for research, in particular with the identification of rare variants, as well as clinical applications which require a great del of accuracy and have thus far been limited to older karyotyping and microarray approaches. This would identify the majority of structural variants, however there are many regions in genomic sequences which are complex in nature, defined as consisting of multiple neighboring or overlapping chromosomal rearrangements that are challenging to resolve with typical SV detection approaches.
In Aim 2, we propose methods to resolve these complex regions and assess their frequency and impact. Furthermore, a crucial step in medical genetics is the comparison of identified genetic mutations to databases of known pathogenic and benign variants. This is currently problematic with SVs, as they have often been originally reported with varying degrees of breakpoint resolution that can hamper the correct assignment of the variant. This issue is compounded further in more complex regions with multiple breakpoints, for which simplistic comparison methods do not work well.
In Aim 3, we will develop and implement a system that describes and utilizes variant profiles to identify whether an individual's sequence data contains a variant of interest. Overall, this project will advance our understanding of the human genome as well as provide tools for use in the general research and clinical communities.

Public Health Relevance

The rearrangement of chromosomal material in the form of structural variation is directly responsible for many disease phenotypes, however our ability to detect and resolve these events from whole genome sequence data is currently limited. We propose a number of strategies for improving the detection and analysis of structural genomic variation between individuals and resolving their underlying structure and function. These approaches will have direct application to the clinical diagnosis of such events and the future of personalized genomics.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
5R01HG007068-02
Application #
8733748
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Brooks, Lisa
Project Start
2013-09-13
Project End
2017-07-31
Budget Start
2014-08-01
Budget End
2015-07-31
Support Year
2
Fiscal Year
2014
Total Cost
$374,673
Indirect Cost
$129,951
Name
University of Michigan Ann Arbor
Department
Biostatistics & Other Math Sci
Type
Schools of Medicine
DUNS #
073133571
City
Ann Arbor
State
MI
Country
United States
Zip Code
48109
Zhao, Xuefang; Weber, Alexandra M; Mills, Ryan E (2017) A recurrence-based approach for validating structural variation using long-read sequencing technology. Gigascience 6:1-9
Hovelson, Daniel H; Liu, Chia-Jen; Wang, Yugang et al. (2017) Rapid, ultra low coverage copy number profiling of cell-free DNA as a precision oncology screening strategy. Oncotarget 8:89848-89866
Zhao, Xuefang; Emery, Sarah B; Myers, Bridget et al. (2016) Resolving complex structural genomic rearrangements using a randomized approach. Genome Biol 17:126
Chun, Sang Y; Rodriguez, Caitlin M; Todd, Peter K et al. (2016) SPECtre: a spectral coherence--based classifier of actively translated transcripts from ribosome profiling sequence data. BMC Bioinformatics 17:482
Sudmant, Peter H; Rausch, Tobias; Gardner, Eugene J et al. (2015) An integrated map of structural variation in 2,504 human genomes. Nature 526:75-81
1000 Genomes Project Consortium; Auton, Adam; Brooks, Lisa D et al. (2015) A global reference for human genetic variation. Nature 526:68-74
Dayama, Gargi; Emery, Sarah B; Kidd, Jeffrey M et al. (2014) The genomic landscape of polymorphic human nuclear mitochondrial insertions. Nucleic Acids Res 42:12640-9
Brand, Harrison; Pillalamarri, Vamsee; Collins, Ryan L et al. (2014) Cryptic and complex chromosomal aberrations in early-onset neuropsychiatric disorders. Am J Hum Genet 95:454-61
Park, Hansoo; Kim, Dohoon; Kim, Chun-Hyung et al. (2014) Increased genomic integrity of an improved protein-based mouse induced pluripotent stem cell method compared with current viral-induced strategies. Stem Cells Transl Med 3:599-609