Genome-wide case-control association studies hold great promises to identify the disease related genes and unveil their underlying complex regulatory mechanisms. For human common diseases, the disease variants are often non-Mendelian: they have low penetrance and show little effects to the carrier s disease susceptibility when being assessed individually, but they may interact with others in complex ways. Identifying multi-locus interactions (epistasis) associations within the human genome is, however, computationally and statistically very challenging. Recent development in statistical methods, such as the stepwise-logistic regression (Marchini et al. 2005) and the BEAM algorithm (Zhang and Liu, 2007), has demonstrated that genome-wide epistasis association mapping is not only feasible, but also can be more fruitful than traditional approaches that exclusively focus on marginal effects. In this proposal, we propose to further improve the BEAM algorithm to explore the LD structures and haplotypes inherited in the human genome to greatly advance our capability in detecting subtle disease associations and interactions. Various haplotype-based association methods have been developed in the past decades, yet there is no consensus on the best approach. We will develop a flexible Bayesian framework for testing both marginal and interaction associations using haplotypes. In particular, all possible haplotype combinations and their interactions will be efficiently explored via Monte Carlo Markov chain (MCMC) algorithms. In addition, we will treat markers that are not genotyped in an association study as the missing data. By iteratively imputing the missing markers and testing their associations, we will be able to identify a few disease associated markers (which may include the unobserved ones) that can explain the observed genetic difference between the patients and the normal people. In addition, unmeasured population structures in a case-control sample will induce long-range correlation between SNPs that may be falsely reported as interactions. It is urgently needed to further improve the efficiency and the accuracy of existing stratification detection algorithms. We propose to develop efficient Bayesian methods to identify population structures presented in the case-control sample. We further propose novel statistical models to adjust for the detected population effects. The software will be written in C++ for both Unix/Linux and Windows systems and freely available to the community.
Showing the most recent 10 out of 18 publications