In order to effectively develop therapies for disease and enhance health, an accurate understanding of the biological pathways that underpin physiology is required. The components of these pathways originally included different genes, proteins, small molecule substrates, and signaling molecules, but have since grown to include new components, such as non-coding RNAs, that have changed the understanding of how these pathways function. Recent investigations of transcriptomes and proteomes across many organisms have revealed yet another new component that was previously overlooked?protein-coding small open reading frames (smORFs), defined here as containing <150 codons. Initial characterization of smORFs has shown them to function in critical processes such as development, metabolism, and DNA repair; however, hundreds or possibly more remain uncharacterized. The goals of this application are to annotate all human smORFs across three cell lines (Aim 1a), explore these smORFs' involvement in the regulation of critical pathways, including inflammation and insulin signaling (Aim 1b), and to establish high confidence interacting partners of selected smORF-encoded proteins, referred to as microproteins, which will aid in future functional characterization studies (Aim 2).
Aim 1 a utilizes a combination of RNA-Seq for de novo transcript assembly, genome-wide ribosome profiling, or Ribo-Seq, to identify translated non-annotated smORFs, and targeted mass spectrometry to validate candidate smORFs in human HEK293T cells, HeLa-S3 cervical carcinoma cells, and GM12878 B-lymphoblastoid cells.
In Aim 1 b, these newly identified smORFs will be analyzed for changes in mRNA expression across published RNA-Seq studies of inflammation and insulin signaling to determine which smORFs play a role in associated diseases, such as diabetes.
In Aim 2, microprotein:protein interactions will be investigated by immunoprecipitation of FLAG-tagged microproteins coupled to mass spectrometry as a means to identify associated protein complexes. As an alternative and complementary method to immunoprecipitation, microprotein:APEX2 fusions will also be used to induce covalent attachment to microprotein binding partners intracellularly. In preliminary experiments, 2,099 non-annotated smORFs have been identified by Ribo-Seq in HEK293T cells. Of these smORFs, 50 are conserved in mice and will help make up the initial batch of microproteins for interaction studies, given the likelihood of conserved genes to be biologically active. Following identification of interacting proteins, the direct microprotein binding sites and partners will be determined by alanine scanning mutagenesis and a synthetic benzoyl phenylalanine- containing photocrosslinkable binding site probe. Achieving these objectives will accomplish the larger goal of defining the protein-coding capacity of the human genome and identifying additional genes with critical functions in biology and disease.

Public Health Relevance

Proteins mediate most cellular and physiological biochemistry; therefore, the characterization of all human proteins is of paramount importance for understanding human biology and disease. The primary goal of this application is to annotate and assess the functional roles of a large group of protein-coding genes that were overlooked by human genome annotation efforts. The successful completion of this goal will increase scientific knowledge regarding the protein-coding capacity of the genome and provide insights that will improve disease diagnosis, treatment, and prevention.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Postdoctoral Individual National Research Service Award (F32)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Bond, Michelle Rueffer
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Salk Institute for Biological Studies
La Jolla
United States
Zip Code