New Bayesian algorithms for genome-wide association mapping

Zhang, Yu

Abstract

Genome-wide association studies hold great promises to reveal the genetic architectures underlying human complex diseases. The disease variants are often non-Mendelian, demonstrating low penetrance and little effects to the disease individually, but interacting with each other and environments in unknown ways. With recent high-throughput sequencing technology, much more data are generated in the genome-scale, including not only genetic variants, but also regulatory elements at the individual-level. Regulatory factors are known to interact and act as mediators between sequence variation and phenotypic diversity. Multi-variant disease mapping therefore becomes more interesting and important for future genome-wide association studies. It is also hoped that, by collecting all variants in the human genome, we could identify the true causative variants, such that functional evaluation and validation experiments can be precisely developed at the identified sites to truly reveal their biological mechanisms to the disease. Identifying multi-variant association is extremely challenging. Current algorithms are still very limited. Particularly, high throughput sequencing data are now routinely generated in disease studies. These complete variants are highly dependent, for which existing methods have substantial computational difficulties and thus make it extremely difficult to pinpoint the true disease variants. It is also very challengingto detect disease associations from rare variants, which are however more abundant in the human genome, and could be the main contributor to human complex diseases. We propose to develop advanced algorithms to tackle the above problems. We will develop advanced algorithms to improve the power and the computational efficiency for whole genome multi-variant mapping. We also propose generalized methods to jointly test common and rare variants under a coherent full probabilistic model. Our approach automatically group variants for joint testing, account for dependence, incorporate biological priors, and identify causative variants. We further extend the methods via non-parametric Bayesian techniques to integrate various sources of public databases in disease mapping. My new algorithms will greatly enhance researchers'capability to analyze high-throughput genetic and genomic data. The software will be freely distributed to the community through the PI's website and the Galaxy system hosted at Penn State.

Public Health Relevance

The goal of the project is to develop new powerful and efficient statistical tools to advance our capability in analyzing genome-wide data sets for human complex diseases, and to better integrate publicly available knowledge bases into disease association mapping. Tools developed in this project will be freely distributed to the research community to facilitate bio-discovery towards understanding the regulatory mechanisms underlying human inherited complex phenotypes.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project (R01)
Project #: 5R01HG004718-05
Application #: 8532953
Study Section: Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer: Brooks, Lisa

Project Start: 2008-08-15
Project End: 2015-06-30
Budget Start: 2013-07-01
Budget End: 2014-06-30
Support Year: 5
Fiscal Year: 2013
Total Cost: $178,195
Indirect Cost: $53,195

Institution

Name: Pennsylvania State University
Department: Biostatistics & Other Math Sci
Type: Schools of Arts and Sciences
DUNS #: 003403953

City: University Park
State: PA
Country: United States
Zip Code: 16802

Related projects


NIH 2014 R01 HG	New Bayesian algorithms for genome-wide association mapping Zhang, Yu / Pennsylvania State University	$183,472
NIH 2013 R01 HG	New Bayesian algorithms for genome-wide association mapping Zhang, Yu / Pennsylvania State University	$178,195
NIH 2012 R01 HG	New Bayesian algorithms for genome-wide association mapping Zhang, Yu / Pennsylvania State University	$178,651
NIH 2010 R01 HG	Bayesian Methods for Epistasis Association Mapping Zhang, Yu / Pennsylvania State University	$140,118
NIH 2009 R01 HG	Bayesian Methods for Epistasis Association Mapping Zhang, Yu / Pennsylvania State University	$141,721
NIH 2008 R01 HG	Bayesian Methods for Epistasis Association Mapping Zhang, Yu / Pennsylvania State University	$141,468

Publications

Zhang, Yu; Tian, Lifeng; Sleiman, Patrick et al. (2018) Bayesian analysis of genome-wide inflammatory bowel disease data sets reveals new risk loci. Eur J Hum Genet 26:265-274

Zhang, Yu; An, Lin; Yue, Feng et al. (2016) Jointly characterizing epigenetic dynamics across multiple human cell types. Nucleic Acids Res 44:6721-31

Chen, Kuan-Bei; Hardison, Ross; Zhang, Yu (2014) dCaP: detecting differential binding events in multiple conditions and proteins. BMC Genomics 15 Suppl 9:S12

Zhang, Yu; Ghosh, Soumitra; Hakonarson, Hakon (2014) Dynamic Bayesian testing of sets of variants in complex diseases. Genetics 198:867-78

Lee, Yeonok; Ghosh, Debashis; Zhang, Yu (2014) Regression hidden Markov modeling reveals heterogeneous gene expression regulation: a case study in mouse embryonic stem cells. BMC Genomics 15:360

Lee, Yeonok; Ghosh, Debashis; Hardison, Ross C et al. (2014) MRHMMs: multivariate regression hidden Markov models and the variantS. Bioinformatics 30:1755-6

Zhang, Yu (2013) De novo inference of stratification and local admixture in sequencing studies. BMC Bioinformatics 14 Suppl 5:S17

Lee, Yeonok; Ghosh, Debashis; Zhang, Yu (2013) Association testing to detect gene-gene interactions on sex chromosomes in trio data. Front Genet 4:239

Zhang, Yu (2013) A dynamic Bayesian Markov model for phasing and characterizing haplotypes in next-generation sequencing. Bioinformatics 29:878-85

Xu, Jialin; Zhang, Yu (2012) A generalized linear model for peak calling in ChIP-Seq data. J Comput Biol 19:826-38

Showing the most recent 10 out of 18 publications