We propose developing, evaluating and comparing biologically motivated statistical methods in analyzing and interpreting heterogeneous and multiple types of genomic data. The overarching theme is that, to take account of genetic heterogeneity in complex diseases while maximizing the use of existing knowledge and data to boost statistical power for new discovery, we propose novel and powerful statistical methods that are adaptive and capable of integrating genotype data with gene pathway and functional annotations and other types of data, such as gene expression data. Specifically, we propose 1) developing powerful and flexible data-adaptive multilocus tests to detect genetic association with complex diseases, which can further integrate genotype and gene expres- sion data;2) extending the adaptive tests to gene pathway analysis;3) extending the adaptive tests to detect multiple trait-multilocus association;4) developing a novel and general framework for fi- nite mixture regressions to account for genetic heterogeneity. We will apply the proposed methods (and existing popular methods) to the ARIC data. We will implement the proposed methods in freely available software.
This proposed research is expected not only to advance statistical methodology and theory for analysis of heterogeneous and multiple types of genomic data, but also to contribute valuable statistical and computational tools to the elucidation of genetic architectures of complex diseases and traits.
|Zhang, Yiwei; Pan, Wei (2015) Principal component regression and linear mixed model in association analysis of structured samples: competitors or complements? Genet Epidemiol 39:149-55|