Large consortium efforts have collected hundreds of genome-wide datasets that have delineated myriad regulatory regions, transcription factor binding sites and large numbers of coding and non-coding transcripts. Even with this massive amount of data, it remains a significant challenge to determine how the mapped elements function together in regulatory networks. This is due in large part to our inability to accurately and quantitatively detect all forms of nascent transcription, the instantaneous output of transcriptional regulation. Moreover, our understanding of global gene regulation is restricted by a lack of computational tools that seamlessly integrate genome-wide datasets. The overall goal of this proposal is to maximize the impact of nascent transcriptome studies and enable facile integration with other functional genomic data. My group developed native elongating transcript sequencing (NET-seq), that enables the strand-specific nucleotide-resolution mapping of RNA polymerase density, highlighting all transcriptional activity regardless of transcript half-lives and revealing precise positions of Pol II pausing where regulatory control is applied. Here, we will develop a new version of NET-seq ? NET-seq 2.0 ? that enables the routine, scalable and flexible application to diverse human cell types (or any eukaryotic system). Moreover, we will increase the potential of NET-seq analysis by developing two innovative bioinformatics strategies to seamlessly integrate NET-seq data with other genome-wide datasets that will have applications beyond NET-seq studies. To demonstrate the broad utility of our integrated approach, we will study regulatory networks and cell differentiation for which instantaneous nascent transcriptional analysis will be highly impactful.
In Aim 1, our goal is to make NET-seq easier, cheaper, and more flexible. Our improvements will reduce background and increase usable reads, dramatically reduce cell input requirements (100-1000-fold), enable dense, region-specific RNA transcription analyses, and enable quantitative comparisons between samples and conditions.
In Aim 2, we will determine transcription kinetics through integrating NET-seq with metabolic RNA labeling (TT-seq) data which report local synthesis rates. This integrative approach yields a rich transcriptional phenotype that we will use to develop gene regulatory network models.
In Aim 3, we will create new computational algorithms that circumvent the need to determine each molecular event separately, and instead infer the status of unmapped events using information-rich datasets, such as NET-seq. We will use integrative deep neural networks (`deep-learning') that use available genome-wide datasets to predict unavailable datasets from data already on hand. We will apply this approach to study erythropoiesis using a well- defined primary human hematopoietic differentiation system by a time series NET-seq and DNase-seq analysis. These data will inform deep neural network models to predict ChIP-seq data for myriad transcription factors and chromatin marks to investigate key regulatory events without additional expense.
The proposed research is relevant to public health, because discovery of regulatory mechanisms in transcription at high resolution is ultimately expected to significantly impact our understanding of most human disease. As such, the proposed research is relevant to the part of the NIH's mission that seeks to develop fundamental knowledge to inform our diagnosis and treatment of human disease.
Showing the most recent 10 out of 14 publications