Annotations of coding genes in the human genome have been tremendously useful in understanding etiology of genetic disorders and in basic biology research. Despite being the most accurate and comprehensive set of genomic features annotated, emerging evidence has indicated that an increasing number of translated regions are missing from the current annotation. These overlooked genomic regions, or formally translated open reading frames (tORFs), represents important biology missing from the current literature. For example, myoregulin, a conserved 46 amino acid micro-peptide was discovered in a ?non-coding? region, and was later demonstrated to function in regulating skeletal muscles in mice. These potentially functional novel tORFs are often small, and therefore overlooked by most coding gene annotation programs. To overcome this challenge, efforts leveraging functional genomics datasets to identify novel coding regions across the human genome have begun to reveal this previously underappreciated class of genomic features. In particular, the applicants previously developed a computational method, riboHMM, which leverages patterns specific to the translated regions in functional genomics data, such as ribo-seq data, in order to identify tORFs genome-wide. Using riboHMM to systematically annotate tORFs in human lymphoblastoid cell lines, 7,273 novel tORFs were found, in addition to the tORFs of known coding genes. These novel tORFs were found in regions of the transcriptome previously annotated as non-coding (e.g. Untranslated Regions and lincRNAs). Although newly developed methods, such as riboHMM, can now systematically identify thousands of previously overlooked tORFs, the biological relevance of these translation events remains unclear. The objective of the current proposal is to evaluate functional relevance for these newly discovered tORFs. Three major aspects of biological importance will be evaluated. First, loss of function impact. Effects of tORF deletion on cell viability and synthetic fitness impact in combination with well- characterized coding genes will be evaluated using pooled CRISPR dropout screens (Aim 1). Second, ability to encode protein/peptide. The ability of tORFs to produce stable protein/peptide will be evaluated in mass spectrometry studies designed for detecting translation products of small ORFs (Aim 2). Third, evolution conservation. The strength of purifying selection on these loci will be carefully evaluated using new alignments created based on independently annotated novel tORFs in chimpanzee and rhesus macaque. The completion of the proposed aims will provide the first systematic evaluation of biological relevance for novel tORFs. Impacts of these new functional annotations could range from providing new interpretations for GWAS hits to reevaluating ?non-coding RNA? function. Results from the proposed study will guide future research directions on this group of previously overlooked genomic features. Given the sheer number of unexplored tORFs and the prior examples of overlooked tORFs that turned out to play critical roles in important biological pathways, the findings here will have far reaching implications for both basic and translational biomedical research.

Public Health Relevance

The proposed project aims to provide the first systematic functional characterization of a novel category of genomic features, translated small Open Reading Frames. This research is relevant to public health because new functional annotations at these regions of the human genome will bring clarification to the coding and non- coding classification of the human transcriptome, which will be useful in interpreting molecular mechanisms and predicting functional impacts of genetic variants linked to these loci.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Brown, Anissa F
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Texas Health Science Center Houston
Schools of Public Health
United States
Zip Code