Complex sensory and signaling pathways ultimately converge on the DNA in the form of complexes of interacting regulatory RNAs and proteins that bind specific gene-proximal and distal regulatory elements to active and repress genes resulting in the establishment and maintenance of cell-type specific transcriptional responses. These regulatory elements have static and dynamic components - the former encoded in the genomic sequence itself as combinations of transcription factor binding site sequence motifs and the latter a consequence of locus and cell-type specific chromatin accessibility and epigenomic modifications as well as the dynamic linking of distal regulatory elements to target genes. In addition, transcriptional and post-transcriptional feedback and control of the levels of regulatory proteins and RNAs also play a key role in dynamic cell-type specific gene regulation. One of the central challenges in modern genomics is to learn these rules by which the genome and epigenome encode regulatory information. How does a single genome sequence encode the information for exquisitely precise, and yet highly distinctive programs of gene regulation for different cell types, and at different time points? What are the key sequence determinants, and grammatical rules that determine the function of any given DNA sequence under any particular set of conditions? In this project we seek to use public data from The Roadmap Epigenomics project and The ENCODE project to make progress on these questions. We propose to build methods for interpreting shared and distinctive regulatory features, including especially transcription factor binding, across related cell types. In parallel, we will implement novel methods for identifying high-resolution, context-specific dynamic regulatory elements, decipher their underlying regulatory sequence grammars and learn predictive, integrative models of transcriptional regulation to decipher the effects of heterogeneous regulatory components on lineage-specific gene expression dynamics and provide regulatory annotations for large collections of curated and disease-associated gene sets.
The purpose of this project is to develop powerful statistical methods to study cell-type specific gene regulation, and apply these to publicly available data from The Roadmap Epigenomics Project, The ENCODE Project and several publicly available gene expression compendia. We propose new methods for joint inference of regulatory elements across diverse cell types, deciphering lineage-specific regulatory sequence grammars encoded in these elements and learning integrative transcriptional regulation programs. Our models will serve as a resource to provide comprehensive context-specific regulatory annotations for disease-associated gene sets and co-expression modules.