Genome-wide association studies (GWAS) have discovered thousands of genetic variations that are associated with hundreds of complex human diseases. However, the underlying mechanisms of how these variants contribute to disease pathogenesis remain obscure. One of the main hurdles is that the majority of disease-associated variants identified are located in the non-coding regions, whose annotations and functions are traditionally poorly understood. Thanks to recent efforts by the ENCODE and Epigenome Roadmap projects, we have identified millions of potential non-coding regulatory elements in the human genome, mainly based on high-throughput assays such as DNase-Seq or ChIP-Seq data. More importantly, it has been shown that 77% of the disease-associated SNPs are located within a potential regulatory region. However, there have been very few studies in which functional experiments were properly performed to elucidate how SNPs can disrupt the function of a distal regulatory element and influence the phenotypes. Another daunting task is how to identify target genes for the distal regulatory elements that harbor the disease-associated SNP. This is a challenging problem because enhancers can work from either upstream or downstream of their target genes, and can be located as far as 1 million base pairs away and still function through chromatin looping. High-throughput methods based on Chromatin Conformation Capture (3C) have emerged (such as Hi-C and ChIA-PET, and Capture Hi-C) and represent an unprecedented opportunity to study higher-order chromatin structure genome-wide. However, data analysis and interpretation for 3C types of data are still in their early stages, and the complex relationship between chromatin interactions and gene regulation has just started to be unraveled. The mechanisms of how TADs, sub-TADs and domain boundaries are formed remains unclear. On the other hand, the impact of 3D structure on gene transcript and epigenetic landscape is also largely unknown and whether they are the cause or the consequence of 3D genome structure is yet to be explored as well. Given the aforementioned challenges and my unique multi-disciplinary training, my long-term goal is to use a combination of high throughput genomic experiments, computational modeling, and functional assays to address the following fundamental questions: 1) How to identify non-coding causal variants for human diseases? 2) What is the molecular mechanism for the formation of 3D genome organization? 3) What is the impact of 3D genome organization on gene regulation and human diseases? The proposed work will deepen our understanding on how genetic variants contribute to gene regulation, 3D genome organization and molecular mechanisms underlying human diseases.

Public Health Relevance

Although recent studies have discovered thousands of genetic variations that are associated with human diseases, the underlying mechanisms of how these variants contribute to disease pathogenesis remain obscure, mainly due to the fact that majority of them are located in non-coding regions. Fortunately, great strides have been made recently on the functional annotation of human genome and the techniques used to investigate 3D genome organization, making it possible to link a non-coding variant to its target genes. My long-term goal is to use a combination of high throughput genomics, computational modeling, and functional assays to investigate how genetic variants contribute to gene regulation, 3D genome organization and molecular mechanisms underlying human diseases.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Unknown (R35)
Project #
5R35GM124820-02
Application #
9551003
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Krasnewich, Donna M
Project Start
2017-09-01
Project End
2022-07-31
Budget Start
2018-08-01
Budget End
2019-07-31
Support Year
2
Fiscal Year
2018
Total Cost
Indirect Cost
Name
Pennsylvania State University
Department
Biochemistry
Type
Schools of Medicine
DUNS #
129348186
City
Hershey
State
PA
Country
United States
Zip Code
17033
Wang, Yanli; Song, Fan; Zhang, Bo et al. (2018) The 3D Genome Browser: a web-based browser for visualizing 3D genome organization and long-range chromatin interactions. Genome Biol 19:151
Zhang, Yan; An, Lin; Xu, Jie et al. (2018) Enhancing Hi-C data resolution with deep convolutional neural network HiCPlus. Nat Commun 9:750
Dixon, Jesse R; Xu, Jie; Dileep, Vishnu et al. (2018) Integrative detection and analysis of structural variation in cancer genomes. Nat Genet 50:1388-1398
Coble, Joel L; Sheldon, Kathryn E; Yue, Feng et al. (2017) Identification of a rare LAMB4 variant associated with familial diverticulitis through exome sequencing. Hum Mol Genet 26:3212-3220
Yang, Tao; Zhang, Feipeng; Yard?mc?, Galip Gürkan et al. (2017) HiCRep: assessing the reproducibility of Hi-C data using a stratum-adjusted correlation coefficient. Genome Res 27:1939-1949