An essential problem in molecular biology is to understand how proteins and DNA interact to regulate gene expression and influence phenotypes. With advanced sequencing technologies, massive amount of genetic, epigenetic, and genomic data sets have been quickly generated. Exploiting the hundreds of genome-wide data sets across many samples provides us with an unprecedented opportunity to study the interplays among regulatory marks and their impacts on gene expression. By comparing genome-wide features across samples, key regulators functioning in specific cell types can be identified with substantial power and resolution. New hypotheses for the mechanisms of gene regulation during cell differentiation can be derived and tested, which will then illuminate previously intractable issues in the genetics of disease susceptibility. While numerous computational endeavors have been conducted to study epigenetic dynamics and pinpoint their locations, there has been a lack of unified and powerful framework to analyze multiple genomes jointly in a way that accounts for both position and cell type specificity of epigenetic events. We recently introduced a new Bayesian method called IDEAS (integrative and discriminative epigenome annotation system) that satisfactorily addressed this need, and using independent experimental data we have demonstrated its superior performance over existing state-of-the-art algorithms. In this project, we aim to substantially expand the scope and applicability of the IDEAS method, and to develop a powerful software tool for public use. In particular, we propose to 1) segment genomes with missing tracks without data imputation and integrate results between studies; 2) model covariate effects and detect epigenomic association; 3) infer fine-grained local cell type relationships; and 4) integrate chromatin conformation data to improve segmentation. In collaboration with Dr. Hardison (co-I), we will further evaluate the accuracy of a subset of our predictions experimentally. The success of this project will benefit method development, generate new resources, and importantly, advance our capability in large-scale data integration towards understanding the roles of (epi)genetics in gene regulation and complex disease.

Public Health Relevance

The goals of the project are to develop advanced and efficient computational tools for studying epigenetic dynamics and differential gene regulation in many cell types jointly. The results from the project will advance our capability in analyzing high-throughput sequencing data sets in gene regulation and biomedical studies. Tools developed in this project will be freely available to the community to facilitate biological discovery towards understanding the mechanics in gene regulation and their impacts on human disease.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM121613-03
Application #
9751894
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Resat, Haluk
Project Start
2017-08-01
Project End
2020-07-31
Budget Start
2019-08-01
Budget End
2020-07-31
Support Year
3
Fiscal Year
2019
Total Cost
Indirect Cost
Name
Pennsylvania State University
Department
Biostatistics & Other Math Sci
Type
Schools of Arts and Sciences
DUNS #
003403953
City
University Park
State
PA
Country
United States
Zip Code
16802