During the last decade, a plethora of novel transcripts has been uncovered, many of which come from regions formerly considered to constitute ?junk? DNA. The characterization of those RNA species during development and disease has led to a burgeoning field within the biology of gene expression. In this proposal, we take initial steps to define a similar path with the proteome. We develop the computational infrastructure and provide a proof-of-principle for the existence of novel peptides derived from regions of the genome that are traditionally considered to be non-coding. We use the new algorithms and tools to begin to identify and define novel peptides derived from presumed non-coding regions across different developmental conditions using the mouse brain as a model system. The framework will include software tools that will allow researchers to build custom-databases from RNA-seq experiments, making it possible to search for translation products without relying on annotation databases. We will carry out rigorous validation experiments and systematic characterization of the novel non-canonical translation events and probe function of select novel proteins. The proposed research has the potential to provide a paradigm-shift for proteomics as researchers will no longer be limited by annotated databases. Since both mass spectrometry and RNA-seq experiments are now practical and no longer cost-prohibitive for most labs, the proposed framework will be of general use and it will be important to understanding the relationship between the genome, transcriptome and the proteome across diseases and biological paradigms.
There are thousands of RNAs derived from unannotated regions of the genome in mammalian cells, and although these transcripts are typically labeled as non-coding, we recently found translation products derived from these non-canonical transcripts. In this proposal we will investigate these novel translation products in the context of neuronal plasticity and the relationship between the translation and transcription by developing a framework for integrating data from high-throughput RNA-seq and proteomics experiments. This work will make it possible to integrate transcriptomics and proteomics across all diseases and biological paradigms.
Tang, Shaojun; Hemberg, Martin; Cansizoglu, Ertugrul et al. (2016) f-divergence cutoff index to simultaneously identify differential expression in the integrated transcriptome and proteome. Nucleic Acids Res 44:e97 |