Array- and sequencing-based association studies have identified many loci harboring genetic variants associated with complex traits and common diseases. Altogether, these associated variants only explain a small proportion of heritability, suggesting that most traits and diseases have a polygenic background and are influenced by many variants with small effects. Early attempts to model polygenic complex traits, notably via the linear mixed models (LMMs) and the best linear unbiased predictor (BLUP), have shown promising outcomes for estimating chip heritability, identifying causal variants, and predicting disease risks. However, statistical methods for modeling polygenic architecture remain in their infancy. In particular, existing methods rely on simple effect size assumptions, are not flexible nor adaptive to the underlying genetic architecture of a given trait or disease, and hence cannot take full advantage of the polygenic natural of most traits and diseases. To increase the power of association test and enable more precise phenotype and risk prediction, I propose to develop a suite of novel statistical methods to accurately and flexibly model the polygenic architecture. These new methods will facilitate evaluation and integration of variant functional annotations, multiple phenotype association mapping, and phenotype and risk prediction in association studies. In particular, we will (1) develop methods to evaluate and integrate variant genomic functional annotations to better understand the polygenic architecture of traits and diseases, and enable powerful association mapping; (2) develop strategies for association mapping with multiple correlated phenotypes to identify pleiotropic associations by taking advantage of the shared polygenic background among phenotypes; and (3) develop methods to flexibly model polygenic architecture and use all variants jointly to achieve accurate phenotype and risk prediction. We will develop efficient algorithms to accompany these methods and implement them in free open-source software. We will perform rigorous simulations and comparisons to evaluate our methods. Finally, we will perform in- depth analysis on several large-scale real data sets, including data from the Global Lipids Genetics Consortium, T2D-GENES and METSIM projects, to demonstrate the power of the proposed methods.
We propose to develop new statistical methods to identify causal variants and predict disease risks for array- and sequencing-based association studies. To increase power of association test and enable more precise phenotype and risk prediction, we will take advantage of the polygenic natural of most traits and diseases by accurately and flexibly modeling the underlying polygenic architecture. These new methods will facilitate integrative analysis with variant functional annotations, multiple phenotype association mapping, and phenotype and risk prediction in association studies. Application of the methods to array-based and whole genome sequencing-based association studies will help identify new associations and facilitate the development of precision medicine.