Genome-wide association studies (GWAS) have identified tens of thousands of genetic variants associated with hundreds of phenotypes and diseases, which in some cases have provided clinical and medical benefits to patients with novel biomarkers and therapeutic targets. However, investigation of complex traits often suffers from limited statistical power due to polygenicity, high dimensionality, and moderate sample size. While it is practically challenging and costly to recruit patients to attain sufficient sample size to identify all associated genetic variants, we recently showed that statistical power to identify risk associated genetic variants can be significantly increased by 1) considering genetic basis shared among multiple phenotypes, namely pleiotropy, and 2) incorporating genomic and genetic annotation data. However, effective integration of these datasets becomes statistically more challenging as the number of genetic studies and annotation data increases. The objective of this proposal is to develop statistical methods and software to improve identification and interpretation of risk variants and to promote understanding of genetic relationship among phenotypes. This objective will be attained by pursuing four specific aims.
In Aim 1, we will develop a Bayesian graphical model to identify risk variants and construct a phenotype network, by integrating multiple GWAS datasets with various annotation data.
In Aim 2, we will develop a Bayesian graphical model to build a phenotype network from biomedical literature.
In Aim 3, we will develop a statistical method to construct meta-annotations that can effectively summarize high dimensional annotation data without losing interpretability.
In Aim 4, we will apply these methods to genetic studies of vascular complications and autoimmune diseases in African American populations, with PubMed literature and various annotation datasets. The proposed research is innovative because it proposes a novel statistical framework that integrates multiple GWAS, biomedical literature, and annotation datasets to improve identification and interpretation of risk variants. The proposed research is significant because it is expected to help improve diagnosis and treatment of diseases with more effective identification of risk variants and enhanced understanding of common etiology among diseases.

Public Health Relevance

We will develop novel statistical methods and software to improve identification and interpretation of risk variants. The output from these methods can potentially be useful for development of overlapping treatments across diseases. The application of these methods to genetic studies for African Americans can help development of more effective disease prevention and intervention strategies for these populations.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZGM1-BBCB-5 (BM))
Program Officer
Marcus, Stephen
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Medical University of South Carolina
Public Health & Prev Medicine
Schools of Medicine
United States
Zip Code
Pandey, Janardan P; Namboodiri, Aryan M; Wolf, Bethany et al. (2018) Endogenous antibody responses to mucin 1 in a large multiethnic cohort of patients with breast cancer and healthy controls: Role of immunoglobulin and Fc? receptor genes. Immunobiology 223:178-182
Lin, Ching Ying; Kwon, Hyunwoo; Rangel Rivera, Guillermo O et al. (2018) Sex Differences in Using Systemic Inflammatory Markers to Prognosticate Patients with Head and Neck Squamous Cell Carcinoma. Cancer Epidemiol Biomarkers Prev 27:1176-1185
Kortemeier, Emma; Ramos, Paula S; Hunt, Kelly J et al. (2018) ShinyGPA: An interactive visualization toolkit for investigating pleiotropic architecture using GWAS datasets. PLoS One 13:e0190949
Kim, Hang J; Yu, Zhenning; Lawson, Andrew et al. (2018) Improving SNP prioritization and pleiotropic architecture estimation by incorporating prior knowledge using graph-GPA. Bioinformatics 34:2139-2141
Renaud, Ludivine; Silveira, Willian A da; Hazard, E Starr et al. (2017) The Plasticizer Bisphenol A Perturbs the Hepatic Epigenome: A Systems Level Analysis of the miRNome. Genes (Basel) 8:
Chung, Dongjun; Kim, Hang J; Zhao, Hongyu (2017) graph-GPA: A graphical model for prioritizing GWAS results and investigating pleiotropic architecture. PLoS Comput Biol 13:e1005388
Chung, Dongjun; Lawson, Andrew; Zheng, W Jim (2017) A statistical framework for biomedical literature mining. Stat Med 36:3461-3474
Davis-Turak, Jeremy; Courtney, Sean M; Hazard, E Starr et al. (2017) Genomics pipelines and data integration: challenges and opportunities in the research setting. Expert Rev Mol Diagn 17:225-237
Wei, Wei; Ramos, Paula S; Hunt, Kelly J et al. (2016) GPA-MDS: A Visualization Approach to Investigate Genetic Architecture among Phenotypes Using GWAS Results. Int J Genomics 2016:6589843