Genome-wide case-control association studies hold great promises to identify the disease related genes and unveil their underlying complex regulatory mechanisms. For human common diseases, the disease variants are often non-Mendelian: they have low penetrance and show little effects to the carrier s disease susceptibility when being assessed individually, but they may interact with others in complex ways. Identifying multi-locus interactions (epistasis) associations within the human genome is, however, computationally and statistically very challenging. Recent development in statistical methods, such as the stepwise-logistic regression (Marchini et al. 2005) and the BEAM algorithm (Zhang and Liu, 2007), has demonstrated that genome-wide epistasis association mapping is not only feasible, but also can be more fruitful than traditional approaches that exclusively focus on marginal effects. In this proposal, we propose to further improve the BEAM algorithm to explore the LD structures and haplotypes inherited in the human genome to greatly advance our capability in detecting subtle disease associations and interactions. Various haplotype-based association methods have been developed in the past decades, yet there is no consensus on the best approach. We will develop a flexible Bayesian framework for testing both marginal and interaction associations using haplotypes. In particular, all possible haplotype combinations and their interactions will be efficiently explored via Monte Carlo Markov chain (MCMC) algorithms. In addition, we will treat markers that are not genotyped in an association study as the missing data. By iteratively imputing the missing markers and testing their associations, we will be able to identify a few disease associated markers (which may include the unobserved ones) that can explain the observed genetic difference between the patients and the normal people. In addition, unmeasured population structures in a case-control sample will induce long-range correlation between SNPs that may be falsely reported as interactions. It is urgently needed to further improve the efficiency and the accuracy of existing stratification detection algorithms. We propose to develop efficient Bayesian methods to identify population structures presented in the case-control sample. We further propose novel statistical models to adjust for the detected population effects. The software will be written in C++ for both Unix/Linux and Windows systems and freely available to the community.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1-GGG-A (52))
Program Officer
Brooks, Lisa
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Pennsylvania State University
Biostatistics & Other Math Sci
Schools of Arts and Sciences
University Park
United States
Zip Code
Zhang, Yu; Tian, Lifeng; Sleiman, Patrick et al. (2018) Bayesian analysis of genome-wide inflammatory bowel disease data sets reveals new risk loci. Eur J Hum Genet 26:265-274
Zhang, Yu; An, Lin; Yue, Feng et al. (2016) Jointly characterizing epigenetic dynamics across multiple human cell types. Nucleic Acids Res 44:6721-31
Lee, Yeonok; Ghosh, Debashis; Zhang, Yu (2014) Regression hidden Markov modeling reveals heterogeneous gene expression regulation: a case study in mouse embryonic stem cells. BMC Genomics 15:360
Lee, Yeonok; Ghosh, Debashis; Hardison, Ross C et al. (2014) MRHMMs: multivariate regression hidden Markov models and the variantS. Bioinformatics 30:1755-6
Chen, Kuan-Bei; Hardison, Ross; Zhang, Yu (2014) dCaP: detecting differential binding events in multiple conditions and proteins. BMC Genomics 15 Suppl 9:S12
Zhang, Yu; Ghosh, Soumitra; Hakonarson, Hakon (2014) Dynamic Bayesian testing of sets of variants in complex diseases. Genetics 198:867-78
Zhang, Yu (2013) De novo inference of stratification and local admixture in sequencing studies. BMC Bioinformatics 14 Suppl 5:S17
Lee, Yeonok; Ghosh, Debashis; Zhang, Yu (2013) Association testing to detect gene-gene interactions on sex chromosomes in trio data. Front Genet 4:239
Zhang, Yu (2013) A dynamic Bayesian Markov model for phasing and characterizing haplotypes in next-generation sequencing. Bioinformatics 29:878-85
Xu, Jialin; Zhang, Yu (2012) A generalized linear model for peak calling in ChIP-Seq data. J Comput Biol 19:826-38

Showing the most recent 10 out of 18 publications