During the last decade, a plethora of novel transcripts has been uncovered, many of which come from regions formerly considered to constitute junk DNA. The characterization of those RNA species during development and disease has led to a burgeoning field within the biology of gene expression. In this proposal, we take initial steps to define a similar path with the proteome. We develop the computational infrastructure and provide a proof-of-principle for the existence of novel peptides derived from regions of the genome that are traditionally considered to be non-coding. We use the new algorithms and tools to begin to identify and define novel peptides derived from presumed non-coding regions across different developmental conditions using the mouse brain as a model system. The framework will include software tools that will allow researchers to build custom-databases from RNA-seq experiments, making it possible to search for translation products without relying on annotation databases. We will carry out rigorous validation experiments and systematic characterization of the novel non-canonical translation events and probe function of select novel proteins. The proposed research has the potential to provide a paradigm-shift for proteomics as researchers will no longer be limited by annotated databases. Since both mass spectrometry and RNA-seq experiments are now practical and no longer cost-prohibitive for most labs, the proposed framework will be of general use and it will be important to understanding the relationship between the genome, transcriptome and the proteome across diseases and biological paradigms.

Public Health Relevance

There are thousands of RNAs derived from unannotated regions of the genome in mammalian cells, and although these transcripts are typically labeled as non-coding, we recently found translation products derived from these non-canonical transcripts. In this proposal we will investigate these novel translation products in the context o neuronal plasticity and the relationship between the translation and transcription by developing a framework for integrating data from high-throughput RNA-seq and proteomics experiments. This work will make it possible to integrate transcriptomics and proteomics across all diseases and biological paradigms.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
1R01GM112007-01A1
Application #
8886760
Study Section
Enabling Bioanalytical and Imaging Technologies Study Section (EBIT)
Program Officer
Edmonds, Charles G
Project Start
2015-04-01
Project End
2019-03-31
Budget Start
2015-04-01
Budget End
2016-03-31
Support Year
1
Fiscal Year
2015
Total Cost
$545,039
Indirect Cost
$236,236
Name
Children's Hospital Boston
Department
Type
DUNS #
076593722
City
Boston
State
MA
Country
United States
Zip Code
02115
Tang, Shaojun; Hemberg, Martin; Cansizoglu, Ertugrul et al. (2016) f-divergence cutoff index to simultaneously identify differential expression in the integrated transcriptome and proteome. Nucleic Acids Res 44:e97