How cell-type specific gene expression programs are established and maintained is a fundamental question in molecular biology. In mammalian cells, hundreds of sequence-specific transcription factors have been catalogued, and they bind the regulatory regions of their target genes in cell-type specific and combinatorial occupancy patterns. Moreover, the developmental programs that generate different cell lineages are accompanied by complex chromatin remodeling. Increasing evidence suggests that the regulatory regions of cell-type specific genes may often be established and sometimes """"""""poised"""""""" by chromatin marks at earlier stages in development. However, the detailed characterization of gene regulatory regions-including their initial establishment in earlier progenitor cells, the dynamics of their chromatin state, and the combinatorial control of gene transcriptional output by multiple transcription factors-has only been studied for a handful of developmentally important genes. The goal of this project is to develop new integrative computational methods that exploit massive next- generation sequencing data sets to fundamentally advance our understanding of cell-type specific transcriptional programs. We will develop integrative computational analysis methods for (1) learning the sequence and chromatin determinants of transcription factor binding from ChIP-seq and DNase-seq;(2) mapping the landscape of chromatin accessibility of all regulatory regions in the human and mouse genomes using DNase-seq across all available cell types, dissecting the poising of their chromatin state in earlier progenitor cells, and extracting the sequence code governing their gain and loss in differentiation;and (3) modeling cell-type specific gene expression programs as a function of chromatin state, transcription factor binding, and regulatory sequence analysis. We will couple our computational methods development with targeted experimental validation, including both locus-specific and genome-wide assays.

Public Health Relevance

Understanding gene regulation is fundamental to the study of normal cellular processes as well as disease. In this project, we develop computational methods to exploit multiple sources of large-scale genomics data enabled by next-generation sequencing technology in order to provide new tools for studying gene regulation in mammalian cells.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZHG1-HGR-M (J2))
Program Officer
Pazin, Michael J
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Sloan-Kettering Institute for Cancer Research
New York
United States
Zip Code
Garrett-Bakelman, Francine E; Sheridan, Caroline K; Kacmarczyk, Thadeous J et al. (2015) Enhanced reduced representation bisulfite sequencing for assessment of DNA methylation at base pair resolution. J Vis Exp :e52246
González, Alvaro J; Setty, Manu; Leslie, Christina S (2015) Early enhancer establishment and regulatory locus complexity shape transcriptional programs in hematopoietic differentiation. Nat Genet 47:1249-59
Setty, Manu; Leslie, Christina S (2015) SeqGL Identifies Context-Dependent Binding Signals in Genome-Wide Regulatory Element Maps. PLoS Comput Biol 11:e1004271
Pelossof, Raphael; Singh, Irtisha; Yang, Julie L et al. (2015) Affinity regression predicts the recognition code of nucleic acid-binding proteins. Nat Biotechnol 33:1242-1249
Shih, Alan H; Jiang, Yanwen; Meydan, Cem et al. (2015) Mutational cooperativity linked to combinatorial epigenetic gain of function in acute myeloid leukemia. Cancer Cell 27:502-15
Li, Sheng; Mason, Christopher E (2014) The pivotal regulatory landscape of RNA modifications. Annu Rev Genomics Hum Genet 15:127-50
Li, Sheng; ?abaj, Pawe? P; Zumbo, Paul et al. (2014) Detecting and correcting systematic variation in large-scale RNA sequencing data. Nat Biotechnol 32:888-95
Li, Sheng; Garrett-Bakelman, Francine; Perl, Alexander E et al. (2014) Dynamic evolution of clonal epialleles revealed by methclone. Genome Biol 15:472
SEQC/MAQC-III Consortium (2014) A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium. Nat Biotechnol 32:903-14
Dubchak, Inna; Balasubramanian, Sandhya; Wang, Sheng et al. (2014) An integrative computational approach for prioritization of genomic variants. PLoS One 9:e114903

Showing the most recent 10 out of 18 publications