It is said that if the genome represents the set of things that the cell could say, then the transcriptome represents the set of things that the cell is saying right now. The transcriptome is the primary read-out of the genome and is composed mainly of mRNAs, long noncoding RNAs, and microRNAs. At all stages of life, in health and in disease, differences in the transcriptome causally define and distinguish each cell type and cell state. The mission of the ENCODE Consortium is to discover, map and define all genes and their regulatory elements. RNA-Seq data and the resulting transcriptomes have therefore been a prominent component of prior phases of ENCODE, and our groups have contributed over 350 released poly(A) mRNA and microRNA datasets for the project. Valuable and high quality as those data are, we are new technologies will allow us to produce a new generation of transcriptomes that are much more information rich and definitive. Specifically, they resolve the multiple different cell types that comprise complex tissues, and they resolve the molecular isoforms in the data. They qualitatively document long-range single molecule splicing and end processing and we provide companion quantification of microRNAs, plus discovery of their often-undocumented precursor RNAs. Finally, we map RNA secondary structure across the transcriptome in vivo. We propose to contribute 300 new, higher precision transcriptomes containing these measurement types and their integrated analysis. A portion of this study focuses on aging humans and mice, a whole life cycle stage that has not previously been studied in ENCODE. The corresponding disease component will contrast the cognitively normal old with Alzheimers dementia in the brain and in the motor system. . .

Public Health Relevance

It is said that if the genome represents the set of things that the cell could say, then the transcriptome represents the set of things that the cell is saying right now. At all stages of life, in health and in disease, differences in the transcriptome, which is the set of RNAs expressed in a sample, causally define and distinguish each cell type and cell state. We propose to contribute 300 new, higher precision transcriptomes that describe and quantify full-length transcripts, microRNAs and messenger RNA secondary structures, which will be precious additions to the Encyclopedia of DNA Elements.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project with Complex Structure Cooperative Agreement (UM1)
Project #
5UM1HG009443-04
Application #
9867737
Study Section
Special Emphasis Panel (ZHG1)
Program Officer
Feingold, Elise A
Project Start
2017-02-01
Project End
2021-01-31
Budget Start
2020-02-01
Budget End
2021-01-31
Support Year
4
Fiscal Year
2020
Total Cost
Indirect Cost
Name
California Institute of Technology
Department
Type
Schools of Arts and Sciences
DUNS #
009584210
City
Pasadena
State
CA
Country
United States
Zip Code
91125
Zinshteyn, Boris; Chan, Dalen; England, Whitney et al. (2018) Assaying RNA structure with LASER-Seq. Nucleic Acids Res :
Wyman, Dana; Mortazavi, Ali (2018) TranscriptClean: Variant-aware correction of indels, mismatches, and splice junctions in long-read transcripts. Bioinformatics :