We will develop the first validated predictive model of how transcription factors dynamically determine genome-wide chromatin accessibility that is generalizable across biological systems. We will accomplish this goal with three specific aims. We will develop novel Genome Syntax to Regulation (GSR) models that accurately learn a genomic regulatory vocabulary and predict how phrases in this vocabulary control chromatin accessibility (Aim 1). As part of this aim we will identify transcription factor binding motifs tha are in the discovered regulatory vocabulary. We will validate and refine the causality of these models by testing whether they accurately predict the chromatin accessibility of thousands of synthetic DNA phrases that have been engineered into specific genomic locations and measured in the context of transcription factor gain-of-function and loss-of-function studies. The phrases will be designed to elucidate both the factors and grammar that control chromatin opening in several distinct cellular states (Aim 2). We will use our predictive models to assign importance scores to individual genome bases and to predict how selected factors alter chromatin accessibility genome wide (Aim 3). We will test the ability of our importance scores to identify regulatory SNPs in the context of human genome-wide association study (GWAS) data, and we will validate model predictions of changes in whole genome chromatin accessibility in response to ectopic factor expression. Through computational modeling of the effect of such ectopic factor expression, we will develop a predictive understanding of how transcription factors alter chromatin state, laying the groundwork for a novel regenerative medicine paradigm of predictive cellular programming.

Public Health Relevance

Access to the information in our genomes is regulated by a cell much like doors in a building can regulate access to rooms that contain instructions. The control over which doors are open in a cell regulates which instructions are accessible in a cell-type specific way. We will understand the code that controls the doors to our genome, and improve human health by understanding what genome changes interfere with door control and demonstrating that we can program cells to open and close their doors to create cells that might be able to serve as replacements for damaged cells in our bodies.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project (R01)
Project #
Application #
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Pazin, Michael J
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Massachusetts Institute of Technology
Organized Research Units
United States
Zip Code
Guo, Yuchun; Tian, Kevin; Zeng, Haoyang et al. (2018) A novel k-mer set memory (KSM) motif representation improves regulatory variant prediction. Genome Res 28:891-900
Shen, Max W; Arbab, Mandana; Hsu, Jonathan Y et al. (2018) Predictable and precise template-free CRISPR editing of pathogenic variants. Nature 563:646-651
Kang, Daniel; Sherwood, Richard; Barkal, Amira et al. (2017) DNase-capture reveals differential transcription factor binding modalities. PLoS One 12:e0187046
Banerjee, Budhaditya; Sherwood, Richard I (2017) A CRISPR view of gene regulation. Curr Opin Syst Biol 1:1-8
Zeng, Haoyang; Edwards, Matthew D; Guo, Yuchun et al. (2017) Accurate eQTL prioritization with an ensemble-based framework. Hum Mutat 38:1259-1265
Rajagopal, Nisha; Srinivasan, Sharanya; Kooshesh, Kameron et al. (2016) High-throughput mapping of regulatory DNA. Nat Biotechnol 34:167-74
Arbab, Mandana; Sherwood, Richard I (2016) Self-Cloning CRISPR. Curr Protoc Stem Cell Biol 38:5B.5.1-5B.5.16
Hashimoto, Tatsunori; Sherwood, Richard I; Kang, Daniel D et al. (2016) A synergistic DNA logic predicts genome-wide chromatin accessibility. Genome Res 26:1430-1440
Zeng, Haoyang; Hashimoto, Tatsunori; Kang, Daniel D et al. (2016) GERV: a statistical method for generative evaluation of regulatory variants for transcription factor binding. Bioinformatics 32:490-6
Ferreira, Leonardo M R; Meissner, Torsten B; Mikkelsen, Tarjei S et al. (2016) A distant trophoblast-specific enhancer controls HLA-G expression at the maternal-fetal interface. Proc Natl Acad Sci U S A 113:5364-9

Showing the most recent 10 out of 11 publications