Genome-wide association studies (GWAS) have identified tens of thousands of genetic variants associated with hundreds of phenotypes and diseases, which in some cases have provided clinical and medical benefits to patients with novel biomarkers and therapeutic targets. However, investigation of complex traits often suffers from limited statistical power due to polygenicity, high dimensionality, and moderate sample size. While it is practically challenging and costly to recruit patients to attain sufficient sample size to identify all associated genetic variants, we recently showed that statistical power to identify risk associated genetic variants can be significantly increased by 1) considering genetic basis shared among multiple phenotypes, namely pleiotropy, and 2) incorporating genomic and genetic annotation data. However, effective integration of these datasets becomes statistically more challenging as the number of genetic studies and annotation data increases. The objective of this proposal is to develop statistical methods and software to improve identification and interpretation of risk variants and to promote understanding of genetic relationship among phenotypes. This objective will be attained by pursuing four specific aims.
In Aim 1, we will develop a Bayesian graphical model to identify risk variants and construct a phenotype network, by integrating multiple GWAS datasets with various annotation data.
In Aim 2, we will develop a Bayesian graphical model to build a phenotype network from biomedical literature.
In Aim 3, we will develop a statistical method to construct meta-annotations that can effectively summarize high dimensional annotation data without losing interpretability.
In Aim 4, we will apply these methods to genetic studies of vascular complications and autoimmune diseases in African American populations, with PubMed literature and various annotation datasets. The proposed research is innovative because it proposes a novel statistical framework that integrates multiple GWAS, biomedical literature, and annotation datasets to improve identification and interpretation of risk variants. The proposed research is significant because it is expected to help improve diagnosis and treatment of diseases with more effective identification of risk variants and enhanced understanding of common etiology among diseases.
We will develop novel statistical methods and software to improve identification and interpretation of risk variants. The output from these methods can potentially be useful for development of overlapping treatments across diseases. The application of these methods to genetic studies for African Americans can help development of more effective disease prevention and intervention strategies for these populations.