How cell-type specific gene expression programs are established and maintained is a fundamental question in molecular biology. In mammalian cells, hundreds of sequence-specific transcription factors have been catalogued, and they bind the regulatory regions of their target genes in cell-type specific and combinatorial occupancy patterns. Moreover, the developmental programs that generate different cell lineages are accompanied by complex chromatin remodeling. Increasing evidence suggests that the regulatory regions of cell-type specific genes may often be established and sometimes "poised" by chromatin marks at earlier stages in development. However, the detailed characterization of gene regulatory regions-including their initial establishment in earlier progenitor cells, the dynamics of their chromatin state, and the combinatorial control of gene transcriptional output by multiple transcription factors-has only been studied for a handful of developmentally important genes. The goal of this project is to develop new integrative computational methods that exploit massive next- generation sequencing data sets to fundamentally advance our understanding of cell-type specific transcriptional programs. We will develop integrative computational analysis methods for (1) learning the sequence and chromatin determinants of transcription factor binding from ChIP-seq and DNase-seq;(2) mapping the landscape of chromatin accessibility of all regulatory regions in the human and mouse genomes using DNase-seq across all available cell types, dissecting the poising of their chromatin state in earlier progenitor cells, and extracting the sequence code governing their gain and loss in differentiation;and (3) modeling cell-type specific gene expression programs as a function of chromatin state, transcription factor binding, and regulatory sequence analysis. We will couple our computational methods development with targeted experimental validation, including both locus-specific and genome-wide assays.
Understanding gene regulation is fundamental to the study of normal cellular processes as well as disease. In this project, we develop computational methods to exploit multiple sources of large-scale genomics data enabled by next-generation sequencing technology in order to provide new tools for studying gene regulation in mammalian cells.
|Mason, Christopher E; Porter, Sandra G; Smith, Todd M (2014) Characterizing multi-omic data in systems biology. Adv Exp Med Biol 799:15-38|
|Li, Sheng; ?abaj, Pawe? P; Zumbo, Paul et al. (2014) Detecting and correcting systematic variation in large-scale RNA sequencing data. Nat Biotechnol 32:888-95|
|Li, Sheng; Tighe, Scott W; Nicolet, Charles M et al. (2014) Multi-platform assessment of transcriptome profiling using RNA-seq in the ABRF next-generation sequencing study. Nat Biotechnol 32:915-25|
|SEQC/MAQC-III Consortium (2014) A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium. Nat Biotechnol 32:903-14|
|Li, Sheng; Mason, Christopher E (2014) The pivotal regulatory landscape of RNA modifications. Annu Rev Genomics Hum Genet 15:127-50|