Advanced strategies for genotype imputation

Rosenberg, Noah; Zoellner, Sebastian

Abstract

Recent genome-wide association (GWA) studies have identified many alleles contributing to disease susceptibility. Genotype imputation methods have been a key contributor to this success. These statistical approaches leverage dense genotypes in publicly available reference panels to estimate genotypes at millions of unmeasured genetic markers in a GWA study. Thus, they enable investigators to test many more markers for disease association beyond those that have been experimentally measured, thereby improving power to detect risk variants. With the recent advent of next-generation sequencing technologies that will facilitate the testing of rare genetic variants for disease association, the importance of imputation is only likely to increase. However, several challenges for optimizing the application of imputation methods remain unaddressed. While imputation accuracy depends on the use of appropriate reference individuals, limited data exist on how to optimally choose the individuals used as a template, particularly in admixed populations such as African Americans and Hispanic/Latino populations. Moreover, the performance of imputation algorithms has been evaluated primarily for common genetic variants. As genetic studies begin to focus on rare variation as a potentially important source for unexplained heritable disease risk, it is essential to improve the properties of genotype imputation for such polymorphisms. Four projects are proposed for addressing these issues. First, imputation accuracy and statistical power will be evaluated in African Americans and in a Hispanic/Latino population, using multiple existing reference datasets, imputation algorithms, and imputation accuracy measures. This project will facilitate the identification of disease-susceptibility loci in African Americans and Hispanic/Latino populations by optimizing imputation in these populations. Second, new model-based statistical techniques for imputation will be devised by considering the unique mosaic structure of genomes of admixed individuals. This work builds on the popular fastPHASE software to further enhance imputation in admixed populations. Third, methods of imputing rare variants, including copy-number variants, will be devised and tested. This analysis will enable the use of rare variants in GWA tests, thereby improving the prospects for uncovering their effects on disease risk. Fourth, algorithms will be developed for optimally selecting individuals for resequencing and use as template individuals for imputation. This work will enhance the design of forthcoming GWA studies that will incorporate resequencing data on subsets of the sample. The projects will be accomplished through a combination of simulation, theory, and computational analysis. Furthermore, algorithms will be applied using datasets on African Americans from Baltimore, Mexican Americans from Starr County, Texas, and the 1000 Genomes Project. Statistical resources generated from the project, which will be disseminated in publicly available software, will provide essential tools for facilitating the ongoing effort of mapping disease genes, particularly in African Americans and Hispanic/Latino populations.

Public Health Relevance

Many disease genes have been identified by """"""""association studies"""""""" that search the human genome for genetic variants that occur more frequently in individuals who carry a disease than in control individuals. We will improve the prospects for identifying disease genes by determining the best statistical strategies for combining data from genetic association studies with data from existing databases. Our project will provide guidelines about optimal study characteristics and statistical methods to find disease genes in understudied, informative populations such as African Americans and Mexican Americans.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project (R01)
Project #: 5R01HG005855-04
Application #: 8293397
Study Section: Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer: Brooks, Lisa

Project Start: 2010-09-13
Project End: 2015-06-30
Budget Start: 2012-07-01
Budget End: 2013-06-30
Support Year: 4
Fiscal Year: 2012
Total Cost: $384,360
Indirect Cost: $60,405

Institution

Name: Stanford University
Department: Biology
Type: Schools of Arts and Sciences
DUNS #: 009214214

City: Stanford
State: CA
Country: United States
Zip Code: 94305

Related projects


NIH 2020 R01 HG	Population genetics for large-scale sequencing studies of diverse populations Rosenberg, Noah; Zoellner, Sebastian / Stanford University
NIH 2020 R01 HG	Population genetics for large-scale sequencing studies of diverse populations Rosenberg, Noah; Zoellner, Sebastian / Stanford University
NIH 2019 R01 HG	Population genetics for large-scale sequencing studies of diverse populations Rosenberg, Noah; Zoellner, Sebastian / Stanford University
NIH 2018 R01 HG	Population genetics for large-scale sequencing studies of diverse populations Rosenberg, Noah; Zoellner, Sebastian / Stanford University
NIH 2017 R01 HG	Population genetics for large-scale sequencing studies of diverse populations Rosenberg, Noah; Zoellner, Sebastian / Stanford University
NIH 2014 R01 HG	Advanced strategies for genotype imputation Rosenberg, Noah; Zoellner, Sebastian / Stanford University
NIH 2013 R01 HG	Advanced strategies for genotype imputation Rosenberg, Noah; Zoellner, Sebastian / Stanford University	$366,830
NIH 2012 R01 HG	Advanced strategies for genotype imputation Rosenberg, Noah; Zoellner, Sebastian / Stanford University	$384,360
NIH 2011 R01 HG	Advanced strategies for genotype imputation Rosenberg, Noah; Zoellner, Sebastian / Stanford University	$463,811
NIH 2010 R01 HG	Advanced strategies for genotype imputation Rosenberg, Noah; Zoellner, Sebastian / University of Michigan Ann Arbor	$377,783

Publications

Aw, Alan J; Rosenberg, Noah A (2018) Bounding measures of genetic similarity and diversity using majorization. J Math Biol 77:711-737

Reppell, M; Zöllner, S (2018) An efficient algorithm for generating the internal branches of a Kingman coalescent. Theor Popul Biol 122:57-66

Kim, Jaehee; Edge, Michael D; Algee-Hewitt, Bridget F B et al. (2018) Statistical Detection of Relatives Typed with Disjoint Forensic and Biomedical Loci. Cell 175:848-858.e6

Arbisser, Ilana M; Jewett, Ethan M; Rosenberg, Noah A (2018) On the joint distribution of tree height and tree length under the coalescent. Theor Popul Biol 122:46-56

Edge, Michael D; Algee-Hewitt, Bridget F B; Pemberton, Trevor J et al. (2017) Linkage disequilibrium matches forensic genetic records to disjoint genomic marker sets. Proc Natl Acad Sci U S A 114:5671-5676

Vattathil, Selina; Scheet, Paul (2016) Extensive Hidden Genomic Mosaicism Revealed in Normal Tissue. Am J Hum Genet 98:571-578

Kang, Jonathan T L; Goldberg, Amy; Edge, Michael D et al. (2016) Consanguinity Rates Predict Long Runs of Homozygosity in Jewish Populations. Hum Hered 82:87-102

Kang, Jonathan T L; Zhang, Peng; Zöllner, Sebastian et al. (2015) Choosing Subsamples for Sequencing Studies by Minimizing the Average Distance to the Closest Leaf. Genetics 201:499-511

Lo, Yancy; Kang, Hyun M; Nelson, Matthew R et al. (2015) Comparing variant calling algorithms for target-exon sequencing in a large sample. BMC Bioinformatics 16:75

Buzbas, Erkan O; Rosenberg, Noah A (2015) AABC: approximate approximate Bayesian computation for inference in population-genetic models. Theor Popul Biol 99:31-42

Showing the most recent 10 out of 43 publications

Comments

Be the first to comment on Noah Rosenberg's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: