Array- and sequencing-based association studies have identified many loci harboring genetic variants associated with complex traits and common diseases. Altogether, these associated variants only explain a small proportion of heritability, suggesting that most traits and diseases have a polygenic background and are influenced by many variants with small effects. Early attempts to model polygenic complex traits, notably via the linear mixed models (LMMs) and the best linear unbiased predictor (BLUP), have shown promising outcomes for estimating chip heritability, identifying causal variants, and predicting disease risks. However, statistical methods for modeling polygenic architecture remain in their infancy. In particular, existing methods rely on simple effect size assumptions, are not flexible nor adaptive to the underlying genetic architecture of a given trait or disease, and hence cannot take full advantage of the polygenic natural of most traits and diseases. To increase the power of association test and enable more precise phenotype and risk prediction, I propose to develop a suite of novel statistical methods to accurately and flexibly model the polygenic architecture. These new methods will facilitate evaluation and integration of variant functional annotations, multiple phenotype association mapping, and phenotype and risk prediction in association studies. In particular, we will (1) develop methods to evaluate and integrate variant genomic functional annotations to better understand the polygenic architecture of traits and diseases, and enable powerful association mapping; (2) develop strategies for association mapping with multiple correlated phenotypes to identify pleiotropic associations by taking advantage of the shared polygenic background among phenotypes; and (3) develop methods to flexibly model polygenic architecture and use all variants jointly to achieve accurate phenotype and risk prediction. We will develop efficient algorithms to accompany these methods and implement them in free open-source software. We will perform rigorous simulations and comparisons to evaluate our methods. Finally, we will perform in- depth analysis on several large-scale real data sets, including data from the Global Lipids Genetics Consortium, T2D-GENES and METSIM projects, to demonstrate the power of the proposed methods.

Public Health Relevance

We propose to develop new statistical methods to identify causal variants and predict disease risks for array- and sequencing-based association studies. To increase power of association test and enable more precise phenotype and risk prediction, we will take advantage of the polygenic natural of most traits and diseases by accurately and flexibly modeling the underlying polygenic architecture. These new methods will facilitate integrative analysis with variant functional annotations, multiple phenotype association mapping, and phenotype and risk prediction in association studies. Application of the methods to array-based and whole genome sequencing-based association studies will help identify new associations and facilitate the development of precision medicine.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
5R01HG009124-04
Application #
9912184
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Sofia, Heidi J
Project Start
2017-06-14
Project End
2022-04-30
Budget Start
2020-05-01
Budget End
2021-04-30
Support Year
4
Fiscal Year
2020
Total Cost
Indirect Cost
Name
University of Michigan Ann Arbor
Department
Type
Schools of Public Health
DUNS #
073133571
City
Ann Arbor
State
MI
Country
United States
Zip Code
48109