Epistasis is the interaction between two or more genes to affect phenotype. It is now widely accepted that epistasis plays an important role in susceptibility to many common diseases. The advent of high-throughput technologies has enabled genome-wide association studies (GWAS or GWA studies). It is compelling that we be able to detect epistasis using GWAS data. However, so far GWA studies have mainly focused on the association of a single gene or loci with a disease. The crucial challenge to analyzing epistasis using GWAS data is finding a way to efficiently handle high-dimensional data sets. The only possible solution is to design efficient algorithms that allow us to find the most relevant epistasic relationships without doing an exhaustive investigation. To the Principal Investigator's knowledge, no current method can do this. This career award will investigate this problem.
The specific aims are as follows:
(Aim 1) develop and evaluate efficient Bayesian network-based methods for learning candidate genes associated with diseases from GWAS sets. Such genes would provide candidates for follow-up biological studies, (Aim 2) implement the methods in a pilot GWAS system for use by researchers when conducting a GWAS, (Aim 3) develop simulated genome-wide data sets and evaluate the pilot system using these data sets, and (Aim 4) conduct GWA studies concerning breast cancer and lung cancer.
Aim 1 will be addressed by developing a succinct Bayesian network model representing epistasis, efficient algorithms which are tailored to investigating such models, integration of the algorithms into methods for learning epistasis, and using simulated datasets to test the effectiveness of the methods and compare their performance to other methods.
Aim 2 will be met by implementing the methods in a pilot GWAS system.
Aim 3 will be satisfied by developing synthetic data sets similar to those found in GWA studies, and using them to evaluate the system.
Aim 4 will be achieved by conducting GWA studies concerning breast and lung cancer. By conducting these studies, we can (1) substantiate previous results concerning the genetic basis of these diseases;(2) possibly obtain interesting new findings pertaining to these diseases. The main hypothesis is that the proposed method will be an advance over existing methods in that it will make it computationally feasible to learn epistatic relationships from genome-wide data and it will therefore yield better discovery performance than existing methods.

Public Health Relevance

Learning gene-gene interactions from genome-wide association studies (GWAS) data is an important and challenging task in genetic epidemiology. This project will develop and evaluate a pilot GWAS system for performing this task. Advances obtained in analyzing GWAS data sets could enable us to learn the genetic basis of many diseases and thereby substantially improve the quality of personalized patient care.

National Institute of Health (NIH)
National Library of Medicine (NLM)
Career Transition Award (K99)
Project #
Application #
Study Section
Special Emphasis Panel (ZLM1-ZH-C (M3))
Program Officer
Ye, Jane
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Pittsburgh
Schools of Medicine
United States
Zip Code
Cai, Binghuang; Jiang, Xia (2014) A novel artificial neural network method for biomedical prediction based on matrix pseudo-inversion. J Biomed Inform 48:114-21
Jiang, Xia; Neapolitan, Richard E (2012) Mining pure, strict epistatic interactions from high-dimensional datasets: ameliorating the curse of dimensionality. PLoS One 7:e46771
Jiang, Xia; Barmada, M Michael; Cooper, Gregory F et al. (2011) A bayesian method for evaluating and discovering disease loci associations. PLoS One 6:e22075
Jiang, Xia; Neapolitan, Richard E; Barmada, M Michael et al. (2011) Learning genetic epistasis using Bayesian network scoring criteria. BMC Bioinformatics 12:89
Jiang, Xia; Barmada, M Michael; Becich, Michael J (2011) Evaluating de novo locus-disease discoveries in GWAS using the signal-to-noise ratio. AMIA Annu Symp Proc 2011:617-24
Jiang, Xia; Barmada, M Michael; Visweswaran, Shyam (2010) Identifying genetic interactions in genome-wide data using Bayesian networks. Genet Epidemiol 34:575-81
Jiang, Xia; Neapolitan, Richard E; Barmada, M Michael et al. (2010) A fast algorithm for learning epistatic genomic relationships. AMIA Annu Symp Proc 2010:341-5