Disease-associated genetic variants have been found to be enriched in regulatory genomic regions of gene expression. In order to gain detailed understanding of disease mechanisms, one central question is to systematically delineate how tissue-specific gene expression programs are regulated. Enhancers are a major family of regulatory elements with complex signatures and they are abundant in the human genome. Enhancer regulation of gene expression is highly tissue-specific, associated with combinatorial transcription factor binding, and involved with long-range three-dimensional chromatin interactions. It is therefore challenging to characterize the large-scale enhancer regulatory networks. The primary goal of this project is to develop a suite of probabilistic models and efficient machine learning algorithms to predict genome-wide enhancer regulatory networks in diverse panels of cellular contexts and the associated molecular mechanisms to establish long-range interactions, which will be leveraged to interpret disease-associated genetic variants.
In Aim 1, novel integrative graphical models will be developed to predict long-range chromatin interactions linking tissue-specific enhancers to their distal target genes, along with combinatorial transcription factor binding patterns.
In Aim 2, computational algorithms will be designed to interrogate how specific chromatin interactions are established, leading to mechanistic insights on chromatin formation.
In Aim 3, statistical models will be developed to integrate enhancer-gene regulatory networks with genetics data to predict which non-coding variants may disrupt regulatory links and cause diseases, with improved statistical power and accuracy. This modeling framework will substantially expand the analytical ability on non-coding variants and human disease mechanisms. Computational predictions from the three aims will be experimentally tested in mouse models of breast cancer development. This project will lead to both innovative computational tools and systematic biological insights on long-range enhancer regulation and their functional roles in human diseases.
The advanced statistical models and machine learning algorithms will provide efficient big-data integration tools for epigenetics, gene regulation and human genetics. Comprehensive predictions from this project represent valuable platforms to delineate genetic variants and disease mechanisms. Novel mechanistic insights of diseases at the regulatory and molecular level will facilitate improved diagnostic and therapeutic approaches for human diseases.