It has long been known that methylation of genomic DNA influences gene expression. The underlying structural mechanisms, however, largely remain obscure. In this project, we will pursue a new strategy for predicting how methylation affects transcription factor (TF) binding, thereby influencing the intricate genomewide landscape of local chromatin structure and gene expression that characterizes each cell. We will explore the hypothesis that methylation causes local changes in DNA shape, which in turn modify TF binding affinity. Motivation comes from our recent analysis of the intrinsic specificity of the endonuclease DNase I. We found that cytosine methylation greatly increases the rate at which DNase I cleaves the DNA backbone adjacent to CpG dinucleotides. The explanation for this is that adding a methyl group in the major groove causes changes in DNA shape that locally narrow the minor groove and enhance the electrostatic interaction between negative backbone phosphates of the DNA and positive amino-acid residues of DNase I. Recognition of DNA shape via the minor groove can also contribute to the binding specificity of eukaryotic TFs, suggesting that methylation sensitivity can be predicted from a shape-based analysis of TF binding preferences among unmethylated DNA sequences, for which ample high-throughput in vitro binding data is available. To explore this, we will first develop and fit models of TF binding specificity that integrate DNA base and shape readout by extending the biophysical model underlying our FeatureREDUCE algorithm to include information about DNA shape from computer simulations of free DNA molecules. Next, we will use these integrated base/shape recognition models to make predictions regarding the methylation sensitivity of TFs, and validate these experimentally. In a parallel approach, we will extend our recently developed SELEX-seq method by using barcoded mixtures of methylated and unmethylated DNA ligands to create detailed maps of the effect of methylation on binding affinity for a representative set of TFs. Finally, we will analyze how the binding specificity of a TF depends on its amino-acid sequence using family-level modeling. Using biophysical base and shape recognition parameters estimated for a large number of TFs from the same structural TF family, along with a novel geometric representation of base preference, we will predict how the binding specificity of basic helix-loop-helix (bHLH) and basic leucine zipper (bZIP) proteins changes when amino-acid residues are mutated, and experimentally validate these predictions. We will use the same family-based approach to demonstrate the existence of alternative dimeric binding modes for bHLH factors, and investigate whether the propensity of a TF to use these alternative modes can be predicted from its protein sequence.
Many organisms, including mammals, use epigenetic modifications of genomic DNA such as cytosine methylation to write a dynamic layer of cellular memory on top of the static information encoded by the genome sequence itself. While abnormal gain or loss of cytosine methylation has been shown to be associated with various diseases and cause nearby genes to be inappropriately silenced or activated, the molecular mechanisms linking methylation with gene expression regulation are large unknown. The goal of this proposal is to develop new experimental and computational tools to analyze how the binding of regulatory proteins to genomic DNA can be influenced by local changes in DNA shape in response to cytosine methylation.
Showing the most recent 10 out of 55 publications