Detecting Genome-Wide Epistasis with Efficient Bayesian Network Learning Epistasis is the interaction between two or more genes to affect phenotype. It is now widely accepted that epistasis plays an important role in susceptibility to many common diseases. The advent of high-throughput technologies has enabled genome-wide association studies (GWAS or GWA studies). It is compelling that we be able to detect epistasis using GWAS data. However, so far GWA studies have mainly focused on the association of a single gene or loci with a disease. The crucial challenge to analyzing epistasis using GWAS data is finding a way to efficiently handle high-dimensional data sets. The only possible solution is to design efficient algorithms that allow us to find the most relevant epistasic relationships without doing an exhaustive investigation. To the Principal Investigator's knowledge, no current method can do this. This career award will investigate this problem.
The specific aims are as follows:
(Aim 1) develop and evaluate efficient Bayesian network-based methods for learning candidate genes associated with diseases from GWAS sets. Such genes would provide candidates for follow-up biological studies, (Aim 2) implement the methods in a pilot GWAS system for use by researchers when conducting a GWAS, (Aim 3) develop simulated genome-wide data sets and evaluate the pilot system using these data sets, and (Aim 4) conduct GWA studies concerning breast cancer and lung cancer.
Aim 1 will be addressed by developing a succinct Bayesian network model representing epistasis, efficient algorithms which are tailored to investigating such models, integration of the algorithms into methods for learning epistasis, and using simulated datasets to test the effectiveness of the methods and compare their performance to other methods.
Aim 2 will be met by implementing the methods in a pilot GWAS system.
Aim 3 will be satisfied by developing synthetic data sets similar to those found in GWA studies, and using them to evaluate the system.
Aim 4 will be achieved by conducting GWA studies concerning breast and lung cancer. By conducting these studies, we can (1) substantiate previous results concerning the genetic basis of these diseases;(2) possibly obtain interesting new findings pertaining to these diseases. The main hypothesis is that the proposed method will be an advance over existing methods in that it will make it computationally feasible to learn epistatic relationships from genome-wide data and it will therefore yield better discovery performance than existing methods.

Public Health Relevance

Learning gene-gene interactions from genome-wide association studies (GWAS) data is an important and challenging task in genetic epidemiology. This project will develop and evaluate a pilot GWAS system for performing this task. Advances obtained in analyzing GWAS data sets could enable us to learn the genetic basis of many diseases and thereby substantially improve the quality of personalized patient care.

National Institute of Health (NIH)
National Library of Medicine (NLM)
Research Transition Award (R00)
Project #
Application #
Study Section
Special Emphasis Panel (NSS)
Program Officer
Ye, Jane
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Pittsburgh
Schools of Medicine
United States
Zip Code
Rathnam, Chandramouli; Lee, Sanghoon; Jiang, Xia (2017) An algorithm for direct causal learning of influences on patient outcomes. Artif Intell Med 75:1-15
Cai, Binghuang; Jiang, Xia (2016) Computational methods for ubiquitination site prediction using physicochemical properties of protein sequences. BMC Bioinformatics 17:116
Zeng, Zexian; Jiang, Xia; Neapolitan, Richard (2016) Discovering causal interactions using Bayesian network scoring and information gain. BMC Bioinformatics 17:221
Hill, Steven M; Heiser, Laura M; Cokelaer, Thomas et al. (2016) Inferring causal molecular networks: empirical assessment through a community-based effort. Nat Methods 13:310-8
Tenenbaum, Jessica D; Avillach, Paul; Benham-Hutchins, Marge et al. (2016) An informatics research agenda to support precision medicine: seven key areas. J Am Med Inform Assoc 23:791-5
Neapolitan, Richard; Jiang, Xia; Ladner, Daniela P et al. (2016) A Primer on Bayesian Decision Analysis With an Application to a Kidney Transplant Decision. Transplantation 100:489-96
Jiang, Xia; Neapolitan, Richard E (2015) Evaluation of a two-stage framework for prediction using big genomic data. Brief Bioinform 16:912-21
Jiang, Xia; Neapolitan, Richard E (2015) LEAP: biomarker inference through learning and evaluating association patterns. Genet Epidemiol 39:173-84
Jiang, Xia; Jao, Jeremy; Neapolitan, Richard (2015) Learning Predictive Interactions Using Information Gain and Bayesian Network Scoring. PLoS One 10:e0143247
Neapolitan, Richard; Horvath, Curt M; Jiang, Xia (2015) Pan-cancer analysis of TCGA data reveals notable signaling pathways. BMC Cancer 15:516

Showing the most recent 10 out of 24 publications