It has long been known that methylation of genomic DNA influences gene expression. The underlying structural mechanisms, however, largely remain obscure. In this project, we will pursue a new strategy for predicting how methylation affects transcription factor (TF) binding, thereby influencing the intricate genomewide landscape of local chromatin structure and gene expression that characterizes each cell. We will explore the hypothesis that methylation causes local changes in DNA shape, which in turn modify TF binding affinity. Motivation comes from our recent analysis of the intrinsic specificity of the endonuclease DNase I. We found that cytosine methylation greatly increases the rate at which DNase I cleaves the DNA backbone adjacent to CpG dinucleotides. The explanation for this is that adding a methyl group in the major groove causes changes in DNA shape that locally narrow the minor groove and enhance the electrostatic interaction between negative backbone phosphates of the DNA and positive amino-acid residues of DNase I. Recognition of DNA shape via the minor groove can also contribute to the binding specificity of eukaryotic TFs, suggesting that methylation sensitivity can be predicted from a shape-based analysis of TF binding preferences among unmethylated DNA sequences, for which ample high-throughput in vitro binding data is available. To explore this, we will first develop and fit models of TF binding specificity that integrate DNA base and shape readout by extending the biophysical model underlying our FeatureREDUCE algorithm to include information about DNA shape from computer simulations of free DNA molecules. Next, we will use these integrated base/shape recognition models to make predictions regarding the methylation sensitivity of TFs, and validate these experimentally. In a parallel approach, we will extend our recently developed SELEX-seq method by using barcoded mixtures of methylated and unmethylated DNA ligands to create detailed maps of the effect of methylation on binding affinity for a representative set of TFs. Finally, we will analyze how the binding specificity of a TF depends on its amino-acid sequence using family-level modeling. Using biophysical base and shape recognition parameters estimated for a large number of TFs from the same structural TF family, along with a novel geometric representation of base preference, we will predict how the binding specificity of basic helix-loop-helix (bHLH) and basic leucine zipper (bZIP) proteins changes when amino-acid residues are mutated, and experimentally validate these predictions. We will use the same family-based approach to demonstrate the existence of alternative dimeric binding modes for bHLH factors, and investigate whether the propensity of a TF to use these alternative modes can be predicted from its protein sequence.

Public Health Relevance

Many organisms, including mammals, use epigenetic modifications of genomic DNA such as cytosine methylation to write a dynamic layer of cellular memory on top of the static information encoded by the genome sequence itself. While abnormal gain or loss of cytosine methylation has been shown to be associated with various diseases and cause nearby genes to be inappropriately silenced or activated, the molecular mechanisms linking methylation with gene expression regulation are large unknown. The goal of this proposal is to develop new experimental and computational tools to analyze how the binding of regulatory proteins to genomic DNA can be influenced by local changes in DNA shape in response to cytosine methylation.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1-GGG-F (02))
Program Officer
Pazin, Michael J
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Columbia University (N.Y.)
Other Domestic Higher Education
New York
United States
Zip Code
Rastogi, Chaitanya; Rube, H Tomas; Kribelbauer, Judith F et al. (2018) Accurate and sensitive quantification of protein-DNA binding affinity. Proc Natl Acad Sci U S A 115:E3692-E3701
Rube, H Tomas; Rastogi, Chaitanya; Kribelbauer, Judith F et al. (2018) A unified approach for quantifying and interpreting DNA shape readout by transcription factors. Mol Syst Biol 14:e7902
Rao, Satyanarayan; Chiu, Tsu-Pei; Kribelbauer, Judith F et al. (2018) Systematic prediction of DNA shape changes due to CpG methylation explains epigenetic effects on protein-DNA binding. Epigenetics Chromatin 11:6
Zhang, Liyang; Martini, Gabriella D; Rube, H Tomas et al. (2018) SelexGLM differentiates androgen and glucocorticoid receptor DNA-binding preference over an extended binding site. Genome Res 28:111-121
Li, Jinsen; Sagendorf, Jared M; Chiu, Tsu-Pei et al. (2017) Expanding the repertoire of DNA shape features for genome-scale studies of transcription factor binding. Nucleic Acids Res 45:12877-12887
Sagendorf, Jared M; Berman, Helen M; Rohs, Remo (2017) DNAproDB: an interactive tool for structural analysis of DNA-protein complexes. Nucleic Acids Res 45:W89-W97
Kribelbauer, Judith F; Laptenko, Oleg; Chen, Siying et al. (2017) Quantitative Analysis of the DNA Methylation Sensitivity of Transcription Factor Complexes. Cell Rep 19:2383-2395
van Arensbergen, Joris; FitzPatrick, Vincent D; de Haas, Marcel et al. (2017) Genome-wide mapping of autonomous promoter activity in human cells. Nat Biotechnol 35:145-153
Bussemaker, Harmen J; Causton, Helen C; Fazlollahi, Mina et al. (2017) Network-based approaches that exploit inferred transcription factor activity to analyze the impact of genetic variation on gene expression. Curr Opin Syst Biol 2:98-102
Chiu, Tsu-Pei; Rao, Satyanarayan; Mann, Richard S et al. (2017) Genome-wide prediction of minor-groove electrostatic potential enables biophysical modeling of protein-DNA binding. Nucleic Acids Res 45:12565-12576

Showing the most recent 10 out of 55 publications