Discovery and analysis of structural variation in whole genome sequences

Mills, Ryan

Abstract

The whole genome sequencing of large cohorts of individuals is quickly becoming a common tool for researchers to investigate the genetic basis of many disease phenotypes. The primary goals are to discover the underlying genetic variation that cause or contribute to these diseases as well as to correctly identify these variants in a diagnostic setting. These differences typicall consist of single base changes (SNPs), but can also encompass larger, more complex chromosomal rearrangements in the form of structural variation (SV) which are much more difficult to detect even with modern sequencing technologies. A number of approaches have been published that have studied this problem, but even the largest scale endeavors have only focused on deletion events and reported a sensitivity of <70%. Complex chromosomal rearrangements are even less well studied. Thus, it is paramount that accurate methods are developed which can detect all types of SVs at high specificity from sequence data. This proposal aims to improve the overall ability of researchers to identify and analyze genetic variation from whole genome sequences. An important, and often overlooked, aspect of SV discovery is the fact that typical paired-end, read depth, and split read approaches will identify different sets of non-overlapping variants at varying degrees of accuracy.
In Aim 1, we will develop a unified SV discovery algorithm that can incorporate all of these different sources of information in a probabilistic fashion. Such a method would be useful for research, in particular with the identification of rare variants, as well as clinical applications which require a great del of accuracy and have thus far been limited to older karyotyping and microarray approaches. This would identify the majority of structural variants, however there are many regions in genomic sequences which are complex in nature, defined as consisting of multiple neighboring or overlapping chromosomal rearrangements that are challenging to resolve with typical SV detection approaches.
In Aim 2, we propose methods to resolve these complex regions and assess their frequency and impact. Furthermore, a crucial step in medical genetics is the comparison of identified genetic mutations to databases of known pathogenic and benign variants. This is currently problematic with SVs, as they have often been originally reported with varying degrees of breakpoint resolution that can hamper the correct assignment of the variant. This issue is compounded further in more complex regions with multiple breakpoints, for which simplistic comparison methods do not work well.
In Aim 3, we will develop and implement a system that describes and utilizes variant profiles to identify whether an individual's sequence data contains a variant of interest. Overall, this project will advance our understanding of the human genome as well as provide tools for use in the general research and clinical communities.

Public Health Relevance

The rearrangement of chromosomal material in the form of structural variation is directly responsible for many disease phenotypes, however our ability to detect and resolve these events from whole genome sequence data is currently limited. We propose a number of strategies for improving the detection and analysis of structural genomic variation between individuals and resolving their underlying structure and function. These approaches will have direct application to the clinical diagnosis of such events and the future of personalized genomics.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project (R01)
Project #: 5R01HG007068-02
Application #: 8733748
Study Section: Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer: Brooks, Lisa

Project Start: 2013-09-13
Project End: 2017-07-31
Budget Start: 2014-08-01
Budget End: 2015-07-31
Support Year: 2
Fiscal Year: 2014
Total Cost: $374,673
Indirect Cost: $129,951

Institution

Name: University of Michigan Ann Arbor
Department: Biostatistics & Other Math Sci
Type: Schools of Medicine
DUNS #: 073133571

City: Ann Arbor
State: MI
Country: United States
Zip Code: 48109

Related projects


NIH 2016 R01 HG	Discovery and analysis of structural variation in whole genome sequences Mills, Ryan E. / University of Michigan Ann Arbor
NIH 2015 R01 HG	Discovery and analysis of structural variation in whole genome sequences Mills, Ryan E. / University of Michigan Ann Arbor	$371,408
NIH 2014 R01 HG	Discovery and analysis of structural variation in whole genome sequences Mills, Ryan E. / University of Michigan Ann Arbor	$374,673
NIH 2013 R01 HG	Discovery and analysis of structural variation in whole genome sequences Mills, Ryan E. / University of Michigan Ann Arbor	$382,699

Publications

Zhao, Xuefang; Weber, Alexandra M; Mills, Ryan E (2017) A recurrence-based approach for validating structural variation using long-read sequencing technology. Gigascience 6:1-9

Hovelson, Daniel H; Liu, Chia-Jen; Wang, Yugang et al. (2017) Rapid, ultra low coverage copy number profiling of cell-free DNA as a precision oncology screening strategy. Oncotarget 8:89848-89866

Zhao, Xuefang; Emery, Sarah B; Myers, Bridget et al. (2016) Resolving complex structural genomic rearrangements using a randomized approach. Genome Biol 17:126

Chun, Sang Y; Rodriguez, Caitlin M; Todd, Peter K et al. (2016) SPECtre: a spectral coherence--based classifier of actively translated transcripts from ribosome profiling sequence data. BMC Bioinformatics 17:482

Sudmant, Peter H; Rausch, Tobias; Gardner, Eugene J et al. (2015) An integrated map of structural variation in 2,504 human genomes. Nature 526:75-81

1000 Genomes Project Consortium; Auton, Adam; Brooks, Lisa D et al. (2015) A global reference for human genetic variation. Nature 526:68-74

Dayama, Gargi; Emery, Sarah B; Kidd, Jeffrey M et al. (2014) The genomic landscape of polymorphic human nuclear mitochondrial insertions. Nucleic Acids Res 42:12640-9

Brand, Harrison; Pillalamarri, Vamsee; Collins, Ryan L et al. (2014) Cryptic and complex chromosomal aberrations in early-onset neuropsychiatric disorders. Am J Hum Genet 95:454-61

Park, Hansoo; Kim, Dohoon; Kim, Chun-Hyung et al. (2014) Increased genomic integrity of an improved protein-based mouse induced pluripotent stem cell method compared with current viral-induced strategies. Stem Cells Transl Med 3:599-609

Comments

Be the first to comment on this grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: