An essential problem in molecular biology is to understand how proteins and DNA interact to regulate gene expression and influence phenotypes. With advanced sequencing technologies, massive amount of genetic, epigenetic, and genomic data sets have been quickly generated. Exploiting the hundreds of genome-wide data sets across many samples provides us with an unprecedented opportunity to study the interplays among regulatory marks and their impacts on gene expression. By comparing genome-wide features across samples, key regulators functioning in specific cell types can be identified with substantial power and resolution. New hypotheses for the mechanisms of gene regulation during cell differentiation can be derived and tested, which will then illuminate previously intractable issues in the genetics of disease susceptibility. While numerous computational endeavors have been conducted to study epigenetic dynamics and pinpoint their locations, there has been a lack of unified and powerful framework to analyze multiple genomes jointly in a way that accounts for both position and cell type specificity of epigenetic events. We recently introduced a new Bayesian method called IDEAS (integrative and discriminative epigenome annotation system) that satisfactorily addressed this need, and using independent experimental data we have demonstrated its superior performance over existing state-of-the-art algorithms. In this project, we aim to substantially expand the scope and applicability of the IDEAS method, and to develop a powerful software tool for public use. In particular, we propose to 1) segment genomes with missing tracks without data imputation and integrate results between studies; 2) model covariate effects and detect epigenomic association; 3) infer fine-grained local cell type relationships; and 4) integrate chromatin conformation data to improve segmentation. In collaboration with Dr. Hardison (co-I), we will further evaluate the accuracy of a subset of our predictions experimentally. The success of this project will benefit method development, generate new resources, and importantly, advance our capability in large-scale data integration towards understanding the roles of (epi)genetics in gene regulation and complex disease.

Public Health Relevance

The goals of the project are to develop advanced and efficient computational tools for studying epigenetic dynamics and differential gene regulation in many cell types jointly. The results from the project will advance our capability in analyzing high-throughput sequencing data sets in gene regulation and biomedical studies. Tools developed in this project will be freely available to the community to facilitate biological discovery towards understanding the mechanics in gene regulation and their impacts on human disease.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Resat, Haluk
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Pennsylvania State University
Biostatistics & Other Math Sci
Schools of Arts and Sciences
University Park
United States
Zip Code