After the completion of the Human Genome Project, several landmarking consortia have accumulated large amounts of genomic data towards understanding the functions of human genome. The ENCODE project has annotated genome-wide regulatory elements. The Roadmap Epigenomic project has characterized tissue-speci?c variation in epigenetic state. The NIH Common Fund GTEx project has delineated tissue-speci?c gene expression and transcription regulation. The NIH Common Fund 4D Nucleome (4DN) project has revealed dynamic 3D chromatin organization in many cell and tissue types. Each of the aforementioned consortia has generated thousands or even tens of thousands of datasets, and provided different insights regarding human genome at an unprecedent scale and depth. However, the datasets generated from these consortia are isolated in terms of cell types and tissue types covered, how the data are stored, and the resolution of the genomic data. These gaps bring realistic data analysis challenges to biomedical researchers when they use these public datasets jointly in their research ? they need to go through different data portals with heterogeneous processing pipelines, different data formats, and unmatched resolutions.
We aim to develop the most cutting-edge deep learning approaches to impute high-resolution chromatin contact maps, and integrate the high-resolution chromatin contact maps with transcriptional data available from GTEx project and epigenomic data from ENCODE/Roadmap. We plan to share the integrated data on a public web server with a multi-panel interactive visualization genome browser. The integrated data will provide an important resource for understanding of tissue-speci?c genetic variation in the light of the spatial organization of these genomic and epigenomic elements and their functional implications.
The goal of this project is to develop novel computational methods to integrate 4DN datasets with GTEx datasets and ENCODE/Roadmap datasets. The integrated datasets will be critical resource to unveil the mechanisms of the genetic variants identi?ed in genome-wide association studies. The new knowledge gained here could help us understand the genetic basis of many human diseases.