Biological systems contain a large number of components whose physical interactions bringabout cellular processes. A fundamental problem in molecular biology is to catalog theseinteractions and to decipher their functional consequences. High throughput sequencing hasmade it possible to characterize some of these interactions rapidly, at high-resolution, and invivo (e.g., protein-DNA binding via ChIP-Seq and protein-RNA binding via RIP-Seq). But manyinteractions are not susceptible to these methods (e.g., RNARNA complexes, ncRNA-DNAbinding, and - aside from recent work described below - DNA-DNA contacts and genomefolding.)This gap may be bridged by coupling high-throughput sequencing with proximity-ligation-basedmethods. In proximity ligation, spatially proximate nucleic acids ligate to one another, forming achimeric oligo. Observation of a chimera composed of X and Y suggests that X and Y musthave been near one another in the original sample. As a result, questions about spatialarrangement become questions about sequence composition, making it possible to takeadvantage of high-throughput sequencing. Nevertheless, the development of these approachesis challenging: they involve subtle molecular biology and produce massive high-dimensionaldatasets requiring wholly new analytical paradigms including extensive physical modeling.We recently developed Hi-C, the first technology that couples proximity ligation and high-throughput sequencing in an unbiased, genome-wide fashion (Lieberman-Aiden et al., Science,2009). Hi-C uses a DNA-DNA proximity ligation step to identify long-range physical contactsbetween genomic DNA loci in vivo. We used Hi-C to create a low-resolution three-dimensionalmap of the human genome, and made two significant discoveries: (1) genetic regulation isaccompanied by the three-dimensional movement of genes from an 'on' compartment to an 'off'compartment, and vice-versa; (2) a never-before-seen macromolecular state, the fractal globule,which couples extraordinary spatial density and a total absence of knots.Here, we propose to dramatically extend the above work, by building a new generation of toolsfor systematically exploring the spatial organization of genomes, RNAs, and proteins, and byapplying these tools to explore how RNAs and proteins establish and regulate the three-dimensional architecture of the genome. We will accomplish this through three specific researchaims:(1) We will create an ensemble of new technologies combining proximity ligation andsequencing to enable comprehensive mapping of (a) DNA-RNA contacts [via DNA-RNAproximity ligation]; (b) RNA-RNA complexes [via RNA-RNA proximity ligation]; (c) selectedprotein-protein complexes [via probe-coupled proximity ligation]. We will use these methods togenerate maps of biomolecular contacts in vivo.(2) We will create high-resolution Hi-C maps of mammalian genomes, comprehensivelymapping promoter-enhancer contacts and exploring large-scale organizational features such astranscription factories.(3) We will develop new analytical approaches that combine the data produced by (1) and (2)with new (a) informatic tools, (b) computational analyses, (c) physical simulations, and (d)rigorous theoretical methods. We will characterize how physical interactions change duringdifferentiation and tumorigenesis; identify the RNAs, proteins and pathways that that are mostcrucial in regulating genome folding, and produce detailed physical models of these pathwaysand how they modulate the physical structure of the genome. We plan to initially apply thesetechniques to characterize murine ES cells differentiating down a neural lineage, and later todifferentiating human ES cells and to primary tumors.This effort will produce powerful new molecular methods which will dramatically improve ourability to assess the spatial arrangement of cellular components. It will transform ourunderstanding of how mammalian genomes fold inside the nucleus. It will reveal how specificphysical interactions between DNA, RNA, and protein play a role in differentiation,tumorigenesis, and genome folding, and suggest new drug targets in the process. Finally, thiswork will generate a series of datasets that will serve as valuable resources for the scientificcommunity as a whole.

Public Health Relevance

Biological systems contain a large number of components whose physical interactions bringabout cellular processes; but our tools for identifying many of these biomolecular interactionsare laborious and slow. We recently developed the Hi-C method for reconstructing thearchitecture of the human genome; and will extend this technological approach to mapinteractions between DNA; RNA; and protein in vivo and at high-throughput. We will use thesemaps to study how genome folding regulates cell function; and to characterize the process ofcellular differentiation and tumorigenesis; identifying crucial biomolecular pathways andpotential drug targets.

National Institute of Health (NIH)
Office of The Director, National Institutes of Health (OD)
NIH Director’s New Innovator Awards (DP2)
Project #
Application #
Study Section
Special Emphasis Panel (ZGM1-NDIA-S (01))
Program Officer
Basavappa, Ravi
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Baylor College of Medicine
United States
Zip Code
Matthews, Benjamin J; Dudchenko, Olga; Kingan, Sarah B et al. (2018) Improved reference genome of Aedes aegypti informs arbovirus vector control. Nature 563:501-507
Vian, Laura; P?kowska, Aleksandra; Rao, Suhas S P et al. (2018) The Energetics and Physiological Impact of Cohesin Extrusion. Cell 173:1165-1178.e20
Robinson, James T; Turner, Douglass; Durand, Neva C et al. (2018) Juicebox.js Provides a Cloud-Based Visualization System for Hi-C Data. Cell Syst 6:256-258.e1
Di Pierro, Michele; Cheng, Ryan R; Lieberman Aiden, Erez et al. (2017) De novo prediction of human chromosome structures: Epigenetic marking patterns encode genome architecture. Proc Natl Acad Sci U S A 114:12126-12131
Rao, Suhas S P; Huang, Su-Chen; Glenn St Hilaire, Brian et al. (2017) Cohesin Loss Eliminates All Loop Domains. Cell 171:305-320.e24
Eagen, Kyle P; Aiden, Erez Lieberman; Kornberg, Roger D (2017) Polycomb-mediated chromatin loops revealed by a subkilobase-resolution chromatin interaction map. Proc Natl Acad Sci U S A 114:8764-8769
Kieffer-Kwon, Kyong-Rim; Nimura, Keisuke; Rao, Suhas S P et al. (2017) Myc Regulates Chromatin Decompaction and Nuclear Architecture during B Cell Activation. Mol Cell 67:566-578.e10
Dudchenko, Olga; Batra, Sanjit S; Omer, Arina D et al. (2017) De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356:92-95
Phanstiel, Douglas H; Van Bortle, Kevin; Spacek, Damek et al. (2017) Static and Dynamic DNA Loops form AP-1-Bound Activation Hubs during Macrophage Development. Mol Cell 67:1037-1048.e6
Durand, Neva C; Robinson, James T; Shamim, Muhammad S et al. (2016) Juicebox Provides a Visualization System for Hi-C Contact Maps with Unlimited Zoom. Cell Syst 3:99-101

Showing the most recent 10 out of 19 publications