After the completion of the Human Genome Project, thousands of experiments from ENCODE and Roadmap Epigenomics projects have successfully pro?led regulatory elements and epigenetic landscape along the genome. More recently, over 2,000 chromatin organization datasets have been generated from 4D Nucleome (4DN) Project, and they provide complementary information about how these genomic and epigenomic elements are spatially organized in a nucleus. Joint analysis of 3D chromatin organization with previously pro?led 1D epigenome in different cell types will be a key step to understand the mechanisms underlying transcriptional regulation over long genomic distances. However, there are two challenges. First, there is a resolution mismatch between chromatin organization data (e.g. Hi-C contacts) which are usually measured at 10k base pair resolution, and epigenome-based chromatin state features (e.g. ChIP-seq peaks) whose signals are usually at tens to hundreds of base pairs. Second, existing computational approaches for analyzing epigenome, such as annotating genome and understanding regulatory elements, all treat the DNA sequence as one-dimensional data, leaving the important 3D structural information unutilized.
We aim to develop the most cutting-edge deep learning approaches for understanding the relationship between chromatin state features and chromatin organization, performing 3D and 4D genome annotation, and identifying spatially collaborative transcription factors, respectively. After the completion of the proposed work, we expect to have: (1) an accurate and interpretable computational model to predict chromatin contact maps at nucleosome resolution for a wide range of cell lines, (2) 3D and 4D genome annotations over dynamic chromatin organization, regulatory elements and epigenomic features, and (3) a computational method for identifying spatially collaborative transcription factors which can help us understand the orchestration of noncoding genetic variants. These results will provide fundamental understanding of disease-relevant genetic variation in the light of the spatial organization of these genomic and epigenomic elements and their functional implications.
The goal of this project is to develop novel computational methods to jointly analyze 3D chromatin organization data and 1D epigenomic features, and reveal the interplay among regulatory elements, epigenomic features, and dynamic chromatin organization. These new methods will help us understand the mechanisms of the genetic variants identi?ed in genome-wide association studies, and understand the genetic basis of many human diseases.