The NIH Roadmap Epigenomics and ENCODE projects have generated a collection of 3000+ epigenomics datasets, including histone modification, DNA methylation, gene expression, and DNaseI hypersensitivity profiled across 190 cell and tissue types. In order to maximize its impact on gene regulation, cellular differentiation, and human health, novel computational analyses are needed. To address this challenge, we will develop new methods for epigenomic analysis, building on our extensive experience interpreting epigenomic information, and our preliminary studies building chromatin states, activity clusters, and regulatory motif maps for the Roadmap Epigenomics and ENCODE datasets.
In Aim 1, we will characterize epigenomic differences and changes during lineage differentiation by developing new tools for systematic comparison of groups of epigenomes that directly exploit the complexity of epigenomic datasets; we will also develop methods for clustering epigenomes into developmental lineages based on automatically-learned diverse epigenomic features that distinguish them; and methods that learn the unidirectional epigenomic changes that pluripotent cells undergo during lineage commitment to gain more insights into differentiation and automatically learn to classify lineages and differentiation trajectories. In Am 2, we will seek to characterize higher-order chromatin architecture and chromatin conformation to enable systematic interpretation of cis-regulatory modules: we will develop a novel statistical approach for enhancer-enhancer and enhancer-gene linking to reveal interacting regions and their target genes based on their coordinated activity patterns across cell and tissue types; we will train a supervised learning method for predicting both constitutive and tissue-specific chromatin conformation information based on chromatin state information, individual chromatin marks, genomic distance, activity, regulatory motif information, and DNA sequence; and we will use these higher-order interaction maps to predict gene expression levels based on the combined action of multiple regulatory regions and to define the cis-regulatory architecture of each gene in the human genome. The resulting resources will be invaluable for studies of gene regulation, by revealing the set of regulatory elements that are linked to each gene, and for the interpretation of genetic studies, by revealing the set of regulatory elements which jointly act to regulate each target gene and the potential target genes of non-coding variants associated with human disease.
The NIH ENCODE and Roadmap Epigenomics projects have generated a collection of 4000+ epigenomics datasets across 200+ cell and tissue types, which can be invaluable to the scientific community, but novel computational analyses are needed to maximize its impact on gene regulation, cellular differentiation, and human health. To address this need, we propose to systematically study epigenomic differences between cell types, lineages, and stages of differentiation, and to learn models predicting the higher-order chromatin structures across cell/tissue types, within each cell/tissue, and the chromatin architectures that they define. By systematically interpreting epigenomic annotations in the context of their cellular differentiation and their higher-order chromatin structures, the resultin resource will greatly increase their impact on human health by enabling disease studies to understand the mechanistic relationship between non-coding variants and their target genes, and the specific differentiation stages in which they act.
Showing the most recent 10 out of 16 publications