Genome-wide association studies (GWAS) have discovered thousands of genetic variations that are associated with hundreds of complex human diseases. However, the underlying mechanisms of how these variants contribute to disease pathogenesis remain obscure. One of the main hurdles is that the majority of disease-associated variants identified are located in the non-coding regions, whose annotations and functions are traditionally poorly understood. Thanks to recent efforts by the ENCODE and Epigenome Roadmap projects, we have identified millions of potential non-coding regulatory elements in the human genome, mainly based on high-throughput assays such as DNase-Seq or ChIP-Seq data. More importantly, it has been shown that 77% of the disease-associated SNPs are located within a potential regulatory region. However, there have been very few studies in which functional experiments were properly performed to elucidate how SNPs can disrupt the function of a distal regulatory element and influence the phenotypes. Another daunting task is how to identify target genes for the distal regulatory elements that harbor the disease-associated SNP. This is a challenging problem because enhancers can work from either upstream or downstream of their target genes, and can be located as far as 1 million base pairs away and still function through chromatin looping. High-throughput methods based on Chromatin Conformation Capture (3C) have emerged (such as Hi-C and ChIA-PET, and Capture Hi-C) and represent an unprecedented opportunity to study higher-order chromatin structure genome-wide. However, data analysis and interpretation for 3C types of data are still in their early stages, and the complex relationship between chromatin interactions and gene regulation has just started to be unraveled. The mechanisms of how TADs, sub-TADs and domain boundaries are formed remains unclear. On the other hand, the impact of 3D structure on gene transcript and epigenetic landscape is also largely unknown and whether they are the cause or the consequence of 3D genome structure is yet to be explored as well. Given the aforementioned challenges and my unique multi-disciplinary training, my long-term goal is to use a combination of high throughput genomic experiments, computational modeling, and functional assays to address the following fundamental questions: 1) How to identify non-coding causal variants for human diseases? 2) What is the molecular mechanism for the formation of 3D genome organization? 3) What is the impact of 3D genome organization on gene regulation and human diseases? The proposed work will deepen our understanding on how genetic variants contribute to gene regulation, 3D genome organization and molecular mechanisms underlying human diseases.
Although recent studies have discovered thousands of genetic variations that are associated with human diseases, the underlying mechanisms of how these variants contribute to disease pathogenesis remain obscure, mainly due to the fact that majority of them are located in non-coding regions. Fortunately, great strides have been made recently on the functional annotation of human genome and the techniques used to investigate 3D genome organization, making it possible to link a non-coding variant to its target genes. My long-term goal is to use a combination of high throughput genomics, computational modeling, and functional assays to investigate how genetic variants contribute to gene regulation, 3D genome organization and molecular mechanisms underlying human diseases.
|Wang, Yanli; Song, Fan; Zhang, Bo et al. (2018) The 3D Genome Browser: a web-based browser for visualizing 3D genome organization and long-range chromatin interactions. Genome Biol 19:151|
|Zhang, Yan; An, Lin; Xu, Jie et al. (2018) Enhancing Hi-C data resolution with deep convolutional neural network HiCPlus. Nat Commun 9:750|
|Dixon, Jesse R; Xu, Jie; Dileep, Vishnu et al. (2018) Integrative detection and analysis of structural variation in cancer genomes. Nat Genet 50:1388-1398|
|Coble, Joel L; Sheldon, Kathryn E; Yue, Feng et al. (2017) Identification of a rare LAMB4 variant associated with familial diverticulitis through exome sequencing. Hum Mol Genet 26:3212-3220|
|Yang, Tao; Zhang, Feipeng; Yard?mc?, Galip Gürkan et al. (2017) HiCRep: assessing the reproducibility of Hi-C data using a stratum-adjusted correlation coefficient. Genome Res 27:1939-1949|