The size of genetic data sets is growing exponentially. At the current rate of growth, the largest reference panels of phased, sequenced individuals will have millions of individuals within 5-7 years. This research will address the computational challenges of performing genotype phasing and imputation in large cohorts and with large reference panels. Large cohorts from outbred populations typically contain a mixture of nominally unrelated and closely related individuals. Current phasing methods for these large data sets do not model parent-offspring or other close relationships. We will develop a new phasing method that greatly increases phase accuracy in closely-related individuals and that scales to large sample sizes. Increasing reference panel size also increases genotype phase and imputation accuracy. However, computational cost also increases with reference panel size. We will develop a new reference file format that substantially reduces the computational cost of imputation and phasing with large reference panels. We will provide a format specification, software, and software libraries so that other researchers and software developers can readily use the new reference file format. We will develop a new computational method for finding shared haplotype segments between a reference panel and a target haplotype. This new method will significantly reduce the cost of phasing and imputation using large reference panels. Finally, we will extend the fastest, most accurate method for genotype phasing and imputation (Beagle 5.0) to analyse chromosome X data. This extension will improve genetic studies of this important chromosome.

Public Health Relevance

This research will develop new methods and software that improve our ability to determine which genetic variants are co-inherited from the same parent. These methods will improve scientists? ability to identify genetic variants that increase or decrease risk of disease, and they will contribute to the prevention, diagnosis, and treatment of heritable diseases in the United States and throughout the world.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
2R01HG008359-04
Application #
9737636
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Sen, Shurjo Kumar
Project Start
2015-09-15
Project End
2023-05-31
Budget Start
2019-08-01
Budget End
2020-05-31
Support Year
4
Fiscal Year
2019
Total Cost
Indirect Cost
Name
University of Washington
Department
Internal Medicine/Medicine
Type
Schools of Medicine
DUNS #
605799469
City
Seattle
State
WA
Country
United States
Zip Code
98195
Browning, Brian L; Zhou, Ying; Browning, Sharon R (2018) A One-Penny Imputed Genome from Next-Generation Reference Panels. Am J Hum Genet 103:338-348
Stamatoyannopoulos, George; Bose, Aritra; Teodosiadis, Athanasios et al. (2017) Genetics of the peloponnesean populations and the theory of extinction of the medieval peloponnesean Greeks. Eur J Hum Genet 25:637-645
Jarvik, Gail P; Browning, Brian L (2016) Consideration of Cosegregation in the Pathogenicity Classification of Genomic Variants. Am J Hum Genet 98:1077-1081
Browning, Brian L; Browning, Sharon R (2016) Genotype Imputation with Millions of Reference Samples. Am J Hum Genet 98:116-26