The size of genetic data sets is growing exponentially. At the current rate of growth, the largest reference panels of phased, sequenced individuals will have millions of individuals within 5-7 years. This research will address the computational challenges of performing genotype phasing and imputation in large cohorts and with large reference panels. Large cohorts from outbred populations typically contain a mixture of nominally unrelated and closely related individuals. Current phasing methods for these large data sets do not model parent-offspring or other close relationships. We will develop a new phasing method that greatly increases phase accuracy in closely-related individuals and that scales to large sample sizes. Increasing reference panel size also increases genotype phase and imputation accuracy. However, computational cost also increases with reference panel size. We will develop a new reference file format that substantially reduces the computational cost of imputation and phasing with large reference panels. We will provide a format specification, software, and software libraries so that other researchers and software developers can readily use the new reference file format. We will develop a new computational method for finding shared haplotype segments between a reference panel and a target haplotype. This new method will significantly reduce the cost of phasing and imputation using large reference panels. Finally, we will extend the fastest, most accurate method for genotype phasing and imputation (Beagle 5.0) to analyse chromosome X data. This extension will improve genetic studies of this important chromosome.

Public Health Relevance

This research will develop new methods and software that improve our ability to determine which genetic variants are co-inherited from the same parent. These methods will improve scientists? ability to identify genetic variants that increase or decrease risk of disease, and they will contribute to the prevention, diagnosis, and treatment of heritable diseases in the United States and throughout the world.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project (R01)
Project #
Application #
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Sen, Shurjo Kumar
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Washington
Internal Medicine/Medicine
Schools of Medicine
United States
Zip Code
Browning, Brian L; Zhou, Ying; Browning, Sharon R (2018) A One-Penny Imputed Genome from Next-Generation Reference Panels. Am J Hum Genet 103:338-348
Stamatoyannopoulos, George; Bose, Aritra; Teodosiadis, Athanasios et al. (2017) Genetics of the peloponnesean populations and the theory of extinction of the medieval peloponnesean Greeks. Eur J Hum Genet 25:637-645
Jarvik, Gail P; Browning, Brian L (2016) Consideration of Cosegregation in the Pathogenicity Classification of Genomic Variants. Am J Hum Genet 98:1077-1081
Browning, Brian L; Browning, Sharon R (2016) Genotype Imputation with Millions of Reference Samples. Am J Hum Genet 98:116-26