Defining the regulatory networks altered in the disease can provide not only the insights on the mechanisms underlying disease, but also the possible therapeutic targets. Several factors such as genetic variation and methylation sites can disrupt the interaction between transcription factors (TFs) and cis-regulatory regions (e.g. promoters and enhancers) and thus alter the regulatory networks. However, to identify the altered networks in disease is still challenging. First, to identify the genetic variation and methylation sites that play a role in gene regulation, we will need to map the genetic variation and methylation sites on the regulatory regions that is specific to the pathological tissues. While DNase I hypersensitivity sites (DHSs) and histone mark profiles are powerful to determine the regulatory regions, it is not feasible for every laboratory to be equipped to measure DHS and histone mark on the tissues of interest. Therefore, we need a computational algorithm that is accurate enough to differentiate the regulatory regions between diseased and normal samples. Second, although a large number of differentially methylated sites have been determined for different disease, their functional role remains largely unclear. DNA methylation has been generally considered as a potent epigenetic modification that prohibits TF recruitment, resulting in transcription suppression. Recent studies and our own preliminary results showed that some TFs preferentially bind to methylated DNA, an interaction that in some cases activates gene transcription. Therefore, we need to identify such TFs and incorporate these methylation- dependent TF-DNA interactions in the computational platform. Third, we need a unified computational framework to incorporate various and these diverse types of factors that could alter the regulatory networks. To address these challenges, we will develop a computational framework to incorporate the effects of genetic and epigenetic variations and identify the regulatory networks altered by these effects. In this framework, we will develop a computational approach to predict regulatory regions in tissue of interest by integrating various epigenetic datasets (Aim 1). Our approach is analogous to homology modeling for protein structure prediction, fully utilizing the existing epigenetic datasets from ENCODE project. We will then develop a model to provide quantitative measurement of interaction strength between TFs and DNA with consideration of genetic variation, DNA methylation and TF concentration (Aim 2). This model will incorporate our new discovery that some TFs preferentially bind to methylated DNA motifs. Our computational framework will then be applied to age-related macular degeneration (AMD), which is the leading cause of vision loss in Americans aged 60 and older. The altered regulatory networks in AMD will then be experimentally evaluated (Aim 3). Finally, we will make our software and the regulatory networks in AMD available through an interactive, user-friendly database (Aim 4).
In this proposal, we will develop novel computational approaches to predict the altered regulatory relationships by integrating a variety of datasets. We will then apply the approaches to age-related macular degeneration, which is the leading cause of vision loss in Americans aged 60 and older. The predictions obtained from this project will not only provide the insights into the mechanisms underlying human disease, but also help to identify the possible therapeutic targets.