Title: Genome analysis based on the integration of DNA sequence and shape PI: Rohs, Remo (USC);Co-I: Noble, William Stafford (UW);Co-I: Tullius, Thomas D. (BU) PROJECT SUMMARY Current techniques for genome analysis are mainly based on the one-dimensional DNA sequence, comprised of the letters A, C, G, and T. However, proteins recognize DNA as a three-dimensional (3D) object. Nuances in DNA shape at single nucleotide resolution play a crucial role in the binding specificity of transcription factors (TFs), including those involved in embryonic development and human cancer. This project involves the development of a battery of tools for genome analysis, through the integration of information derived from the DNA sequence and the 3D structure of DNA, or "DNA shape". The basis for these novel tools is a high- throughput (HT) method for the prediction of multiple features of local DNA shape at the genomic scale. Data will be made available to the community in the UCSC Genome Browser track format through a web server interface. These tools will enable users to analyze the shape of any number or length of DNA sequences, including whole genomes and the effect of DNA methylation. HT shape predictions will be validated based on X-ray crystallography, NMR spectroscopy, and hydroxyl radical cleavage data. Predictions will be combined with ORChID, an ENCODE project that infers DNA minor groove geometry from hydroxyl radical cleavage experiments. The HT method will be used to study how paralogous TFs select different target sites in vivo despite sharing core-binding motifs or having similar binding properties in vitro. To study this question, we will investigate the effect of flanking sequences on multiple structural features of TF binding sites (TFBSs). The initial focus of this study will be homeodomains and basic helix-loop-helix (bHLH) TFs. Other protein families will later be included and used to construct a comprehensive TFBS database that provides shape features for binding motifs derived from JASPAR and other motif databases. Structural effects of single nucleotide polymorphisms (SNPs) will also be analyzed. Some SNPs are associated with deleterious functions, whereas others have no apparent effect. The HT shape prediction method will be used to predict the function of SNPs in non-coding regions based on DNA shape. We will correlate quantitative effects of SNPs on DNA structure with expression quantitative trait loci (eQTLs) and genome-wide association study (GWAS) signals, to develop a predictive tool for the functional effect of SNPs. The HT shape prediction approach will be used to design DNA sequences with different AT/GC contents but similar shapes. The relative contributions of sequence and shape to binding will be tested with analytic models including multiple linear regression (MLR) and support vector regression (SVR). For systems in which the integration of sequence and shape proves advantageous, novel motif finding tools will be developed based on an extended alphabet that combines sequence with informative structural features, selected by machine learning and feature selection approaches. Sequence+shape motifs will be tested by motif scanning, compared to sequence-only motifs, and integrated into the MEME Suite. The goal of this sequence-shape integration is to increase the accuracy of finding in vivo TFBSs in the genome.

Public Health Relevance

Protein-DNA recognition is a critical yet poorly understood component of gene regulation. This proposal will connect the fields of DNA sequence and structure analysis, which so far have been developed in parallel but largely disconnected from each other. Integration of the one-dimensional DNA sequence at a genome-wide scale with the three-dimensional DNA structure at atomic resolution will lead to the development of novel genome analysis tools and will advance our understanding of genome function, leading to fundamentally new insights into the mechanisms of gene regulation and its impact on human disease.

National Institute of Health (NIH)
Research Project (R01)
Project #
Application #
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Krasnewich, Donna M
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Southern California
Schools of Arts and Sciences
Los Angeles
United States
Zip Code
Schöne, Stefanie; Jurk, Marcel; Helabad, Mahdi Bagherpoor et al. (2016) Sequences flanking the core-binding site modulate glucocorticoid receptor structure and activity. Nat Commun 7:12621
Dror, Iris; Rohs, Remo; Mandel-Gutfreund, Yael (2016) How motif environment influences transcription factor search dynamics: Finding a needle in a haystack. Bioessays 38:605-12
Mathelier, Anthony; Xin, Beibei; Chiu, Tsu-Pei et al. (2016) DNA Shape Features Improve Transcription Factor Binding Site Predictions In Vivo. Cell Syst 3:278-286.e4
Chiu, Tsu-Pei; Comoglio, Federico; Zhou, Tianyin et al. (2016) DNAshapeR: an R/Bioconductor package for DNA shape prediction and feature encoding. Bioinformatics 32:1211-3
Kuzu, Guray; Kaye, Emily G; Chery, Jessica et al. (2016) Expansion of GA Dinucleotide Repeats Increases the Density of CLAMP Binding Sites on the X-Chromosome to Promote Drosophila Dosage Compensation. PLoS Genet 12:e1006120
Tangprasertchai, Narin S; Zhang, Xiaojun; Ding, Yuan et al. (2015) An Integrated Spin-Labeling/Computational-Modeling Approach for Mapping Global Structures of Nucleic Acids. Methods Enzymol 564:427-53
Dror, Iris; Golan, Tamar; Levy, Carmit et al. (2015) A widespread role of the motif environment in transcription factor binding across diverse protein families. Genome Res 25:1268-80
Deng, Zengqin; Wang, Qing; Liu, Zhao et al. (2015) Mechanistic insights into metal ion activation and operator recognition by the ferric uptake regulator. Nat Commun 6:7642
Zentner, Gabriel E; Kasinathan, Sivakanthan; Xin, Beibei et al. (2015) ChEC-seq kinetics discriminates transcription factor binding sites by DNA sequence and shape in vivo. Nat Commun 6:8733
Levo, Michal; Zalckvar, Einat; Sharon, Eilon et al. (2015) Unraveling determinants of transcription factor binding outside the core binding site. Genome Res 25:1018-29

Showing the most recent 10 out of 19 publications