Fast and powerful extensions of mixed model methods for GWAS

Loh, Po-Ru

Abstract

Genome-wide association studies (GWAS) have improved our understanding of the genetic architectures of many complex diseases and hold the promise of identifying genomic loci of causal variants and enabling accurate genetic risk prediction. However, because most traits of medical interest are influenced by a multitude of genetic factors, each of which explain only a small fraction of heritability, cohort sizes on the scale of hundreds of thousands of individuals will be necessary to provide the statistical power required to detect these elusive associations. This proposal aims to develop fast and powerful statistical methods addressing key challenges that arise in modeling such large-scale data sets: correcting for subtle confounding from population stratification or cryptic relatedness among study participants while maintaining computational tractability. The current state of the art approach to association testing uses linear mixed models to simultaneously model the effects of all markers while accounting for sample structure. Existing mixed model techniques are computationally expensive, however, and also assume that all markers have nonzero effects. This proposal aims to extend mixed model methods by developing and implementing a new well-calibrated mixed model statistic that can be computed very quickly and tailored to more realistic genetic architectures. The first specific aim is to develop a novel method that analyzes linkage disequilibrium patterns to calibrate mixed model association test scores, distinguishing genome-wide inflation of test statistics due to sample structure from perceived inflation that is actually the true result of many causal loci. This method will safeguard against the alternative dangers of false positive associations from confounding or power loss from overly conservative calibration.
The second aim i s to develop a fast algorithm that applies modern iterative methods for numerical linear algebra to reduce the computational complexity of mixed model association testing to linear in the data size. This advance will enable mixed model analysis to remain feasible as study sizes increase, unlocking associations from rare or small-effect variants.
The third aim i s to extend the method to model genetic architectures in which most markers have no disease association - as is widely believed - thereby improving statistical power. All of these techniques will be validated in simulation, implemented in software released to the scientific community, and applied to real GWAS data sets to search for additional associations that reach significance.

Public Health Relevance

Although genome-wide association studies have improved our understanding of the genetic bases of many complex diseases, most traits of interest have hundreds or thousands of causal factors that are extremely difficult to detect. This proposal aims to advance the statistical methodology used to detect associations by improving statistical power and reducing the computational burden of large-scale data analysis. The techniques developed will enable continued discovery of disease-associated genetic variants and more accurate prediction of genetic risk. Fast and powerful extensions of mixed model methods for GWAS Although genome-wide association studies (GWAS) have been successful in improving our understanding of the genetic architectures of many complex diseases, most traits of interest are highly polygenic - i.e., influenced by many genetic factors - and thus challenging to decipher: in a typical scenario, the tens or hundreds of associated loci that have been identified to date each explain a small percentage of phenotypic variance, collectively accounting for only a fraction of estimated heritability. In order to detect associations of such small magnitudes, it s critical to maximize the power available from available samples and to account for subtle confounders such as population stratification and cryptic relatedness. The current state of the art approach uses linear mixed models to simultaneously model the effects of all markers while accounting for sample structure via a genetic relatedness matrix. Existing techniques for mixed models are computationally expensive, however, and also unrealistically assume implicitly that all markers have effect sizes drawn from identical independent normal prior distributions. This proposal aims to extend mixed model methods by developing a new well-calibrated mixed model statistic that can be computed very quickly and tailored to the hypothesized genetic architecture underlying a trait to improve power. The new method will be implemented in publicly available, open-source software and applied to analyze real data sets of medical interest. To achieve these aims, I will employ my strong mathematical background and previous experience developing computational methods and software for population genetics while receiving new training in medical genetics.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Postdoctoral Individual National Research Service Award (F32)
Project #: 5F32HG007805-03
Application #: 9186420
Study Section: Special Emphasis Panel (ZRG1)
Program Officer: Colley, Heather

Project Start: 2014-12-01
Project End: 2017-07-31
Budget Start: 2016-12-01
Budget End: 2017-07-31
Support Year: 3
Fiscal Year: 2017
Total Cost
Indirect Cost

Institution

Name: Harvard University
Department: Public Health & Prev Medicine
Type: Schools of Public Health
DUNS #: 149617367

City: Boston
State: MA
Country: United States
Zip Code: 02115

Related projects


NIH 2017 F32 HG	Fast and powerful extensions of mixed model methods for GWAS Loh, Po-Ru / Harvard University
NIH 2016 F32 HG	Fast and powerful extensions of mixed model methods for GWAS Loh, Po-Ru / Harvard University
NIH 2014 F32 HG	Fast and Powerful Extensions of Mixed Model Methods for Gwas Loh, Po-Ru / Harvard University	$49,214

Publications

Loh, Po-Ru; Genovese, Giulio; Handsaker, Robert E et al. (2018) Insights into clonal haematopoiesis from 8,342 mosaic chromosomal alterations. Nature 559:350-355

Loh, Po-Ru; Palamara, Pier Francesco; Price, Alkes L (2016) Fast and accurate long-range phasing in a UK Biobank cohort. Nat Genet 48:811-6

Loh, Po-Ru; Danecek, Petr; Palamara, Pier Francesco et al. (2016) Reference-based phasing using the Haplotype Reference Consortium panel. Nat Genet 48:1443-1448

Galinsky, Kevin J; Bhatia, Gaurav; Loh, Po-Ru et al. (2016) Fast Principal-Component Analysis Reveals Convergent Evolution of ADH1B in Europe and East Asia. Am J Hum Genet 98:456-472

Tucker, George; Loh, Po-Ru; MacLeod, Iona M et al. (2015) Two-Variance-Component Model Improves Genetic Prediction in Family Datasets. Am J Hum Genet 97:677-90

Loh, Po-Ru; Bhatia, Gaurav; Gusev, Alexander et al. (2015) Contrasting genetic architectures of schizophrenia and other complex diseases using fast variance-components analysis. Nat Genet 47:1385-92

Bulik-Sullivan, Brendan K; Loh, Po-Ru; Finucane, Hilary K et al. (2015) LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat Genet 47:291-5

Loh, Po-Ru; Tucker, George; Bulik-Sullivan, Brendan K et al. (2015) Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat Genet 47:284-90

Hayeck, Tristan J; Zaitlen, Noah A; Loh, Po-Ru et al. (2015) Mixed model with correction for case-control ascertainment increases association power. Am J Hum Genet 96:720-30

Lipson, Mark; Loh, Po-Ru; Sankararaman, Sriram et al. (2015) Calibrating the Human Mutation Rate via Ancestral Recombination Density in Diploid Genomes. PLoS Genet 11:e1005550

Comments

Be the first to comment on this grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: