It has long been known that methylation of genomic DNA influences gene expression. The underlying structural mechanisms, however, largely remain obscure. In this project, we will pursue a new strategy for predicting how methylation affects transcription factor (TF) binding, thereby influencing the intricate genomewide landscape of local chromatin structure and gene expression that characterizes each cell. We will explore the hypothesis that methylation causes local changes in DNA shape, which in turn modify TF binding affinity. Motivation comes from our recent analysis of the intrinsic specificity of the endonuclease DNase I. We found that cytosine methylation greatly increases the rate at which DNase I cleaves the DNA backbone adjacent to CpG dinucleotides. The explanation for this is that adding a methyl group in the major groove causes changes in DNA shape that locally narrow the minor groove and enhance the electrostatic interaction between negative backbone phosphates of the DNA and positive amino-acid residues of DNase I. Recognition of DNA shape via the minor groove can also contribute to the binding specificity of eukaryotic TFs, suggesting that methylation sensitivity can be predicted from a shape-based analysis of TF binding preferences among unmethylated DNA sequences, for which ample high-throughput in vitro binding data is available. To explore this, we will first develop and fit models of TF binding specificity that integrate DNA base and shape readout by extending the biophysical model underlying our FeatureREDUCE algorithm to include information about DNA shape from computer simulations of free DNA molecules. Next, we will use these integrated base/shape recognition models to make predictions regarding the methylation sensitivity of TFs, and validate these experimentally. In a parallel approach, we will extend our recently developed SELEX-seq method by using barcoded mixtures of methylated and unmethylated DNA ligands to create detailed maps of the effect of methylation on binding affinity for a representative set of TFs. Finally, we will analyze how the binding specificity of a TF depends on its amino-acid sequence using family-level modeling. Using biophysical base and shape recognition parameters estimated for a large number of TFs from the same structural TF family, along with a novel geometric representation of base preference, we will predict how the binding specificity of basic helix-loop-helix (bHLH) and basic leucine zipper (bZIP) proteins changes when amino-acid residues are mutated, and experimentally validate these predictions. We will use the same family-based approach to demonstrate the existence of alternative dimeric binding modes for bHLH factors, and investigate whether the propensity of a TF to use these alternative modes can be predicted from its protein sequence.

Public Health Relevance

Many organisms, including mammals, use epigenetic modifications of genomic DNA such as cytosine methylation to write a dynamic layer of cellular memory on top of the static information encoded by the genome sequence itself. While abnormal gain or loss of cytosine methylation has been shown to be associated with various diseases and cause nearby genes to be inappropriately silenced or activated, the molecular mechanisms linking methylation with gene expression regulation are large unknown. The goal of this proposal is to develop new experimental and computational tools to analyze how the binding of regulatory proteins to genomic DNA can be influenced by local changes in DNA shape in response to cytosine methylation.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1-GGG-F (02))
Program Officer
Pazin, Michael J
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Columbia University (N.Y.)
Other Domestic Higher Education
New York
United States
Zip Code
Fazlollahi, Mina; Muroff, Ivor; Lee, Eunjee et al. (2016) Identifying genetic modulators of the connectivity between transcription factors and their transcriptional targets. Proc Natl Acad Sci U S A 113:E1835-43
Chiu, Tsu-Pei; Comoglio, Federico; Zhou, Tianyin et al. (2016) DNAshapeR: an R/Bioconductor package for DNA shape prediction and feature encoding. Bioinformatics 32:1211-3
Bell, Robert J A; Rube, H Tomas; Xavier-Magalhães, Ana et al. (2016) Understanding TERT Promoter Mutations: A Common Path to Immortality. Mol Cancer Res 14:315-23
Zhou, Tianyin; Shen, Ning; Yang, Lin et al. (2015) Quantitative modeling of transcription factor binding specificities using DNA shape. Proc Natl Acad Sci U S A 112:4654-9
Lu, Xiang-Jun; Bussemaker, Harmen J; Olson, Wilma K (2015) DSSR: an integrated software tool for dissecting the spatial structure of RNA. Nucleic Acids Res 43:e142
Abe, Namiko; Dror, Iris; Yang, Lin et al. (2015) Deconvolving the recognition of DNA shape from sequence. Cell 161:307-18
Riley, Todd R; Lazarovici, Allan; Mann, Richard S et al. (2015) Building accurate sequence-to-affinity models from high-throughput in vitro protein-DNA binding data using FeatureREDUCE. Elife 4:
Dantas Machado, Ana Carolina; Zhou, Tianyin; Rao, Satyanarayan et al. (2015) Evolving insights on how cytosine methylation affects protein-DNA binding. Brief Funct Genomics 14:61-73
Chiu, Tsu-Pei; Yang, Lin; Zhou, Tianyin et al. (2015) GBshape: a genome browser database for DNA shape annotations. Nucleic Acids Res 43:D103-9
Bussemaker, Harmen J (2015) Recent progress in understanding transcription factor binding specificity. Brief Funct Genomics 14:1-2

Showing the most recent 10 out of 43 publications