The human genome, composed of 46 DNA molecules (chromosomes), stretches for nearly two meters when fully extended, yet must fit into a cellular nucleus that is only 10 micrometers in diameter. A major unsolved question is how the genome is packaged and organized in this tiny three-dimensional space. This project seeks to address this question by developing novel theoretical and computational approaches to characterize the three-dimensional organization of the human genome. A Google Map-like resource for the genome will be designed such that one can easily navigate through its hierarchical organization across multiple length scales. This will prove an invaluable aid for studying genetic mechanisms related to the translation of information in the DNA genome to the RNA and protein molecules that constitute the "workhorses" of the cell. The project will also provide excellent interdisciplinary training that bridges chemistry, physics and biology for high school students, undergraduates, graduate students and postdoctoral fellows.

This project aims to build a predictive genome model with an integrative framework that combines statistical mechanics and computational modeling with bioinformatics analysis. Novel theoretical approaches will be developed to derive the two key components of the model, including (1) an input sequence that uniquely represents a chromosome from a given cell type; and (2) a force field that describes the potential energy surface whose global minimum determines the most stable genome structure. To capture the variation of genome conformation across cell types, epigenetic information--including histone modifications, which are reflective of "chromatin states"--will be superimposed on top of the DNA sequence and used as input. Physical attributes will be used to maximize the information entropy. Long-range contact potentials will be derived from existing, publicly available genome-wide chromosome conformation capture data using a rigorous statistical optimization algorithm. This predictive genome model will not only enable a high-resolution characterization of the genome organization across a wide range of cell types, but can also help uncover the underlying physical principles and driving forces for genome folding. Moreover, the model will add value to the foundational biological datasets (from the human ENCODE project) and enable formulation of testable hypotheses relating genome organization to functional output

This award is co-funded by the Genetic Mechanisms Program in the Division of Molecular and Cellular Biosciences in the Biological Sciences Directorate and by the Program for Computational and Data-Enabled Science and Engineering in Mathematical and Statistical Sciences in the Division of Mathematical Sciences in the Mathematical and Physical Sciences Directorate.

National Science Foundation (NSF)
Division of Molecular and Cellular Biosciences (MCB)
Application #
Program Officer
Karen Cone
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Massachusetts Institute of Technology
United States
Zip Code