The ENCODE consortium analyzed RNA produced from 1% of the genome. They reported that the transcriptome is incredibly complex, and that approximately 93% of the genome is transcribed into RNA. The current paradigm is that this RNA is "non-protein- coding" and likely plays an unknown regulatory role in the cell. We hypothesize that much of this RNA is not only transcribed, but also translated into protein. To support our hypothesis, we bioinformatically translated into protein the RNA that ENCODE discovered and used it to construct a database for searching mass spectra derived from the proteome of five different cell lines. By starting with the proteome and working backwards toward RNA, we have preliminary data indicating that polyadenylated RNA transcripts from intergenic space are translated. We propose to discover and identify peptides and small proteins translated from ENCODE RNA (1% of genome, 30Mb) and to create a proteomic map that helps connect the proteome to the transcriptome. It would also help clarify the complexities of the transcriptome. If our hypothesis proves correct, it would change current thinking about intergenic space, gene number and provide an opportunity to study new genes and new proteins with new functions.
In humans, the number of genes was estimated to be much higher than the current ~21,000 gene estimate. Because of recent discoveries that the genome is pervasively transcribed, we believe that much of the RNA in a cell is also translated into protein. If true our findings would fuel the discoveries of many new genes with new functions.