It has long been known that methylation of genomic DNA influences gene expression. The underlying structural mechanisms, however, largely remain obscure. In this project, we will pursue a new strategy for predicting how methylation affects transcription factor (TF) binding, thereby influencing the intricate genomewide landscape of local chromatin structure and gene expression that characterizes each cell. We will explore the hypothesis that methylation causes local changes in DNA shape, which in turn modify TF binding affinity. Motivation comes from our recent analysis of the intrinsic specificity of the endonuclease DNase I. We found that cytosine methylation greatly increases the rate at which DNase I cleaves the DNA backbone adjacent to CpG dinucleotides. The explanation for this is that adding a methyl group in the major groove causes changes in DNA shape that locally narrow the minor groove and enhance the electrostatic interaction between negative backbone phosphates of the DNA and positive amino-acid residues of DNase I. Recognition of DNA shape via the minor groove can also contribute to the binding specificity of eukaryotic TFs, suggesting that methylation sensitivity can be predicted from a shape-based analysis of TF binding preferences among unmethylated DNA sequences, for which ample high-throughput in vitro binding data is available. To explore this, we will first develop and fit models of TF binding specificity that integrate DNA base and shape readout by extending the biophysical model underlying our FeatureREDUCE algorithm to include information about DNA shape from computer simulations of free DNA molecules. Next, we will use these integrated base/shape recognition models to make predictions regarding the methylation sensitivity of TFs, and validate these experimentally. In a parallel approach, we will extend our recently developed SELEX-seq method by using barcoded mixtures of methylated and unmethylated DNA ligands to create detailed maps of the effect of methylation on binding affinity for a representative set of TFs. Finally, we will analyze how the binding specificity of a TF depends on its amino-acid sequence using family-level modeling. Using biophysical base and shape recognition parameters estimated for a large number of TFs from the same structural TF family, along with a novel geometric representation of base preference, we will predict how the binding specificity of basic helix-loop-helix (bHLH) and basic leucine zipper (bZIP) proteins changes when amino-acid residues are mutated, and experimentally validate these predictions. We will use the same family-based approach to demonstrate the existence of alternative dimeric binding modes for bHLH factors, and investigate whether the propensity of a TF to use these alternative modes can be predicted from its protein sequence.

Public Health Relevance

Many organisms, including mammals, use epigenetic modifications of genomic DNA such as cytosine methylation to write a dynamic layer of cellular memory on top of the static information encoded by the genome sequence itself. While abnormal gain or loss of cytosine methylation has been shown to be associated with various diseases and cause nearby genes to be inappropriately silenced or activated, the molecular mechanisms linking methylation with gene expression regulation are large unknown. The goal of this proposal is to develop new experimental and computational tools to analyze how the binding of regulatory proteins to genomic DNA can be influenced by local changes in DNA shape in response to cytosine methylation.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1-GGG-F (02))
Program Officer
Good, Peter J
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Columbia University (N.Y.)
Other Domestic Higher Education
New York
United States
Zip Code
Chiu, Tsu-Pei; Yang, Lin; Zhou, Tianyin et al. (2015) GBshape: a genome browser database for DNA shape annotations. Nucleic Acids Res 43:D103-9
Dantas Machado, Ana Carolina; Zhou, Tianyin; Rao, Satyanarayan et al. (2015) Evolving insights on how cytosine methylation affects protein-DNA binding. Brief Funct Genomics 14:61-73
Slattery, Matthew; Zhou, Tianyin; Yang, Lin et al. (2014) Absence of a simple code: how transcription factors read the genome. Trends Biochem Sci 39:381-99
Riley, Todd R; Slattery, Matthew; Abe, Namiko et al. (2014) SELEX-seq: a method for characterizing the complete repertoire of binding site preferences for transcription factor complexes. Methods Mol Biol 1196:255-78
Ghosh, Hiyaa S; Ceribelli, Michele; Matos, Ines et al. (2014) ETO family protein Mtg16 regulates the balance of dendritic cell subsets by repressing Id2. J Exp Med 211:1623-35
Zhang, Xiaojun; Dantas Machado, Ana Carolina; Ding, Yuan et al. (2014) Conformations of p53 response elements in solution deduced using site-directed spin labeling and Monte Carlo sampling. Nucleic Acids Res 42:2789-97
Lee, Eunjee; de Ridder, Jeroen; Kool, Jaap et al. (2014) Identifying regulatory mechanisms underlying tumorigenesis using locus expression signature analysis. Proc Natl Acad Sci U S A 111:5747-52
van Arensbergen, Joris; van Steensel, Bas; Bussemaker, Harmen J (2014) In search of the determinants of enhancer-promoter interaction specificity. Trends Cell Biol 24:695-702
Dror, Iris; Zhou, Tianyin; Mandel-Gutfreund, Yael et al. (2014) Covariation between homeodomain transcription factors and the shape of their DNA binding sites. Nucleic Acids Res 42:430-41
Yang, Lin; Zhou, Tianyin; Dror, Iris et al. (2014) TFBSshape: a motif database for DNA shape features of transcription factor binding sites. Nucleic Acids Res 42:D148-55

Showing the most recent 10 out of 32 publications