Large consortium efforts have collected hundreds of genome-wide datasets that have delineated myriad regulatory regions, transcription factor binding sites and large numbers of coding and non-coding transcripts. Even with this massive amount of data, it remains a significant challenge to determine how the mapped elements function together in regulatory networks. This is due in large part to our inability to accurately and quantitatively detect all forms of nascent transcription, the instantaneous output of transcriptional regulation. Moreover, our understanding of global gene regulation is restricted by a lack of computational tools that seamlessly integrate genome-wide datasets. The overall goal of this proposal is to maximize the impact of nascent transcriptome studies and enable facile integration with other functional genomic data. My group developed native elongating transcript sequencing (NET-seq), that enables the strand-specific nucleotide-resolution mapping of RNA polymerase density, highlighting all transcriptional activity regardless of transcript half-lives and revealing precise positions of Pol II pausing where regulatory control is applied. Here, we will develop a new version of NET-seq ? NET-seq 2.0 ? that enables the routine, scalable and flexible application to diverse human cell types (or any eukaryotic system). Moreover, we will increase the potential of NET-seq analysis by developing two innovative bioinformatics strategies to seamlessly integrate NET-seq data with other genome-wide datasets that will have applications beyond NET-seq studies. To demonstrate the broad utility of our integrated approach, we will study regulatory networks and cell differentiation for which instantaneous nascent transcriptional analysis will be highly impactful.
In Aim 1, our goal is to make NET-seq easier, cheaper, and more flexible. Our improvements will reduce background and increase usable reads, dramatically reduce cell input requirements (100-1000-fold), enable dense, region-specific RNA transcription analyses, and enable quantitative comparisons between samples and conditions.
In Aim 2, we will determine transcription kinetics through integrating NET-seq with metabolic RNA labeling (TT-seq) data which report local synthesis rates. This integrative approach yields a rich transcriptional phenotype that we will use to develop gene regulatory network models.
In Aim 3, we will create new computational algorithms that circumvent the need to determine each molecular event separately, and instead infer the status of unmapped events using information-rich datasets, such as NET-seq. We will use integrative deep neural networks (`deep-learning') that use available genome-wide datasets to predict unavailable datasets from data already on hand. We will apply this approach to study erythropoiesis using a well- defined primary human hematopoietic differentiation system by a time series NET-seq and DNase-seq analysis. These data will inform deep neural network models to predict ChIP-seq data for myriad transcription factors and chromatin marks to investigate key regulatory events without additional expense.

Public Health Relevance

The proposed research is relevant to public health, because discovery of regulatory mechanisms in transcription at high resolution is ultimately expected to significantly impact our understanding of most human disease. As such, the proposed research is relevant to the part of the NIH's mission that seeks to develop fundamental knowledge to inform our diagnosis and treatment of human disease.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
2R01HG007173-06A1
Application #
9521770
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Chadwick, Lisa
Project Start
2013-04-01
Project End
2022-05-31
Budget Start
2018-08-10
Budget End
2019-05-31
Support Year
6
Fiscal Year
2018
Total Cost
Indirect Cost
Name
Harvard Medical School
Department
Genetics
Type
Schools of Medicine
DUNS #
047006379
City
Boston
State
MA
Country
United States
Zip Code
Mischo, Hannah E; Chun, Yujin; Harlen, Kevin M et al. (2018) Cell-Cycle Modulation of Transcription Termination Factor Sen1. Mol Cell 70:312-326.e7
Doris, Stephen M; Chuang, James; Viktorovskaya, Olga et al. (2018) Spt6 Is Required for the Fidelity of Promoter Selection. Mol Cell 72:687-699.e6
Harlen, Kevin M; Churchman, L Stirling (2017) Subgenic Pol II interactomes identify region-specific transcription elongation regulators. Mol Syst Biol 13:900
Jin, Yi; Eser, Umut; Struhl, Kevin et al. (2017) The Ground State and Evolution of Promoter Region Directionality. Cell 170:889-898.e10
Harlen, Kevin M; Churchman, L Stirling (2017) The code and beyond: transcription regulation by the RNA polymerase II carboxy-terminal domain. Nat Rev Mol Cell Biol 18:263-273
Mayer, Andreas; Landry, Heather M; Churchman, L Stirling (2017) Pause & go: from the discovery of RNA polymerase pausing to its functional implications. Curr Opin Cell Biol 46:72-80
Winter, Georg E; Mayer, Andreas; Buckley, Dennis L et al. (2017) BET Bromodomain Proteins Function as Master Transcription Elongation Factors Independent of CDK9 Recruitment. Mol Cell 67:5-18.e19
Boswell, Sarah A; Snavely, Andrew; Landry, Heather M et al. (2017) Total RNA-seq to identify pharmacological effects on specific stages of mRNA synthesis. Nat Chem Biol 13:501-507
Mayer, Andreas; Churchman, L Stirling (2017) A Detailed Protocol for Subcellular RNA Sequencing (subRNA-seq). Curr Protoc Mol Biol 120:4.29.1-4.29.18
Mayer, Andreas; Churchman, L Stirling (2016) Genome-wide profiling of RNA polymerase transcription at nucleotide resolution in human cells with native elongating transcript sequencing. Nat Protoc 11:813-33

Showing the most recent 10 out of 14 publications