The non?coding (nc) transcriptome remains an under?explored landscape for functional genomics. Recently, ~2,000 long non?coding (lnc)RNAs were identified by the Steitz lab upon exposure of human cells to stress, such as heat, high salt and oxidative stress, while others have confirmed their induction in viral infection, cancer and aging. Called DoGs for ?Downstream of Gene? transcripts, these lncRNAs result when RNA polymerase II fails to cleave nascent RNA 3' ends at the annotated site for a subset of protein?coding genes that we term ?parent genes?. Instead, transcription continues from 5 to 45 kbps further downstream, and DoGs are retained in the nucleus. DoG RNAs are expressed on the timescale of minutes upon stress, suggesting they are among the ?first? responders? to help cells survive. Total DoGs account for 15%?30% of all intergenic transcripts, yet they are not even annotated in the human genome. Taken together, these features define an urgent need to determine the sequence and function of DoG RNAs, which are central goals of this proposal.
In Aim 1, we propose to sequence individual DoGs from their 5' to 3' ends, using emerging long read sequencing methodology established for polyA+ and polyA? RNA in the Neugebauer lab. We will exploit physiological stresses to induce DoGs by orders of magnitude and optimize library preparation on several platforms to achieve the appropriate sequencing length and depth for all of the parameters we aim to quantify. The data will reveal the actual lengths, 5' and 3' ends and the extent to which DoG RNAs are spliced, modified and polyadenylated. Importantly, we will test our working hypotheses based on preliminary results that splicing and histone post?translational modifications play mechanistic roles in DoG biogenesis. These findings will give us the first concrete clues regarding the cellular machineries impinged upon by stress pathways.
In Aim 2, we propose concurrent functional analyses of DoGs that exploit our recent preliminary finding that DoG production by the mouse interferon?? gene enhances subsequent expression of interferon?? upon exposure to polyIC (mimic of viral infection). Therefore, we will ask whether other DoGs likewise prime expression of their parent genes upon exposure to a second stress. We will pursue other preliminary results suggesting that DoG parent genes are associated with transcriptional repression and that DoG production has the potential to up? or down?regulate the parent gene. We will probe the mechanism of action of DoGs through analyses of transcription elongation and the chromatin landscape in DoG gene regions with new and published ChIP data. Finally, determination of DoG half?lives before, during and after stress will allow us to explore the conceptually novel possibility that DoGs are repositories for unprocessed pre?mRNAs that are later matured to become active mRNAs during recovery from stress. The achievement of these aims will illuminate the sequences and function(s) of an entirely new class of ncRNA, as well as the gene regions and chromatin environments where transcriptional activity is regulated by cellular stresses. Moreover, entirely novel lncRNA?mediated pathways of gene regulation are likely to be identified.

Public Health Relevance

The human genome sequence lays the foundation for identifying disease?associated mutations and genetic variations, insights into gene regulation, genomic tools, and therapeutics. We and others discovered a novel class of long non?coding (lnc)RNAs called DoGs (for ?downstream of genes?) expressed when cells experience stress (e.g. heat, high salt, oxidative stress, viral infection and cancer); DoGs are rapidly produced when transcription of normal genes aberrantly continues. We will apply the most recent advances in RNA sequencing technology and functional genomic approaches to discover the role(s) of DoG RNAs in cell physiology.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
1R01GM140735-01
Application #
10158039
Study Section
Molecular Genetics A Study Section (MGA)
Program Officer
Gaillard, Shawn R
Project Start
2021-01-19
Project End
2024-12-31
Budget Start
2021-01-19
Budget End
2021-12-31
Support Year
1
Fiscal Year
2021
Total Cost
Indirect Cost
Name
Yale University
Department
Anatomy/Cell Biology
Type
Schools of Medicine
DUNS #
043207562
City
New Haven
State
CT
Country
United States
Zip Code