How cell-type specific gene expression programs are established and maintained is a fundamental question in molecular biology. In mammalian cells, hundreds of sequence-specific transcription factors have been catalogued, and they bind the regulatory regions of their target genes in cell-type specific and combinatorial occupancy patterns. Moreover, the developmental programs that generate different cell lineages are accompanied by complex chromatin remodeling. Increasing evidence suggests that the regulatory regions of cell-type specific genes may often be established and sometimes """"""""poised"""""""" by chromatin marks at earlier stages in development. However, the detailed characterization of gene regulatory regions-including their initial establishment in earlier progenitor cells, the dynamics of their chromatin state, and the combinatorial control of gene transcriptional output by multiple transcription factors-has only been studied for a handful of developmentally important genes. The goal of this project is to develop new integrative computational methods that exploit massive next- generation sequencing data sets to fundamentally advance our understanding of cell-type specific transcriptional programs. We will develop integrative computational analysis methods for (1) learning the sequence and chromatin determinants of transcription factor binding from ChIP-seq and DNase-seq;(2) mapping the landscape of chromatin accessibility of all regulatory regions in the human and mouse genomes using DNase-seq across all available cell types, dissecting the poising of their chromatin state in earlier progenitor cells, and extracting the sequence code governing their gain and loss in differentiation;and (3) modeling cell-type specific gene expression programs as a function of chromatin state, transcription factor binding, and regulatory sequence analysis. We will couple our computational methods development with targeted experimental validation, including both locus-specific and genome-wide assays.

Public Health Relevance

Understanding gene regulation is fundamental to the study of normal cellular processes as well as disease. In this project, we develop computational methods to exploit multiple sources of large-scale genomics data enabled by next-generation sequencing technology in order to provide new tools for studying gene regulation in mammalian cells.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZHG1-HGR-M (J2))
Program Officer
Pazin, Michael J
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Sloan-Kettering Institute for Cancer Research
New York
United States
Zip Code
Shih, Alan H; Meydan, Cem; Shank, Kaitlyn et al. (2017) Combination Targeted Therapy to Disrupt Aberrant Oncogenic Signaling and Reverse Epigenetic Dysfunction in IDH2- and TET2-Mutant Acute Myeloid Leukemia. Cancer Discov 7:494-505
Garrett-Bakelman, Francine E; Sheridan, Caroline K; Kacmarczyk, Thadeous J et al. (2015) Enhanced reduced representation bisulfite sequencing for assessment of DNA methylation at base pair resolution. J Vis Exp :e52246
Setty, Manu; Leslie, Christina S (2015) SeqGL Identifies Context-Dependent Binding Signals in Genome-Wide Regulatory Element Maps. PLoS Comput Biol 11:e1004271
González, Alvaro J; Setty, Manu; Leslie, Christina S (2015) Early enhancer establishment and regulatory locus complexity shape transcriptional programs in hematopoietic differentiation. Nat Genet 47:1249-59
Pelossof, Raphael; Singh, Irtisha; Yang, Julie L et al. (2015) Affinity regression predicts the recognition code of nucleic acid-binding proteins. Nat Biotechnol 33:1242-1249
Shih, Alan H; Jiang, Yanwen; Meydan, Cem et al. (2015) Mutational cooperativity linked to combinatorial epigenetic gain of function in acute myeloid leukemia. Cancer Cell 27:502-15
Li, Sheng; ?abaj, Pawe? P; Zumbo, Paul et al. (2014) Detecting and correcting systematic variation in large-scale RNA sequencing data. Nat Biotechnol 32:888-95
Li, Sheng; Mason, Christopher E (2014) The pivotal regulatory landscape of RNA modifications. Annu Rev Genomics Hum Genet 15:127-50
SEQC/MAQC-III Consortium (2014) A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium. Nat Biotechnol 32:903-14
Li, Sheng; Garrett-Bakelman, Francine; Perl, Alexander E et al. (2014) Dynamic evolution of clonal epialleles revealed by methclone. Genome Biol 15:472

Showing the most recent 10 out of 19 publications