The completion of the human genome was supposed to provide a definitive list of all protein-coding genes in cells and tissues; however, the recent discovery of hundreds, potentially thousands, of novel protein-coding genes has revealed an unexpected blind spot in gene annotation for small protein-coding open reading frames (smORFs). The goals of this application are to 1) annotated all human and mouse smORFs (i.e., the smORFeome) (Aim 1), and 2) characterize the biochemical (Aim 2) and cellular (Aim 3) functions of selected smORFs. Achieving these objectives will accomplish the larger goal of defining the protein-coding capacity of the human genome and identifying additional genes with critical functions in biology and disease.
Aim 1 a utilizes an innovative strategy that integrates next-generation sequencing, bioinformatics, and proteomics to define the human smORFeome in several human cell lines.
In Aim 1 b, the same approach is applied to mouse tissues to generate the first smORF Tissue Atlas. This Atlas will enable the rapid identification of smORFs with unique patterns of expression or smORFs that are regulated during a disease, thereby accelerating the physiological characterization of these genes. There is extensive overlap between the mouse and human smORFs identified to date, so conclusions from this work should readily extend to human disease. smORF-encoded peptides and small proteins are referred to as microproteins, and in Aims 2a and 2b, two different proteomics strategies will be applied to the identification of microprotein-protein interactions. Microprotein PPIs will reveal the biochemical function(s) of smORFs and accelerate the biological characterization of these genes. Lastly, in Aim 3, experiments will be performed to define the biological role of a smORF with an integral role in the unfolded protein response called PIGBOS. The successful completion of these Aims will increase basic knowledge regarding the protein-coding potential of the human and mouse genomes, and provide functional insights that will lead to improvements in disease diagnosis, treatment, and prevention.

Public Health Relevance

Proteins mediate most cellular and physiological biochemistry, and the characterization of all human proteins is of paramount importance for understanding human biology and disease. The primary goal of this application is to annotate and functionally characterize a large group of protein-coding genes that were missed by the human genome project. The successful completion of this goal will increase scientific knowledge regarding the protein- coding capacity of the genome and provide insights that will improve disease diagnosis, treatment, and prevention.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Enabling Bioanalytical and Imaging Technologies Study Section (EBIT)
Program Officer
Wu, Mary Ann
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Salk Institute for Biological Studies
La Jolla
United States
Zip Code
D'Lima, Nadia G; Ma, Jiao; Winkler, Lauren et al. (2017) A human microprotein that interacts with the mRNA decapping complex. Nat Chem Biol 13:174-180
Saghatelian, Alan (2017) Novel Biology and Druggable Targets via Chemoproteomics. Biochemistry 56:6515-6516
Arnoult, Nausica; Correia, Adriana; Ma, Jiao et al. (2017) Regulation of DNA repair pathway choice in S and G2 phases by the NHEJ inhibitor CYREN. Nature 549:548-552
Chu, Qian; Rathore, Annie; Diedrich, Jolene K et al. (2017) Identification of Microprotein-Protein Interactions via APEX Tagging. Biochemistry 56:3299-3306
Ma, Jiao; Diedrich, Jolene K; Jungreis, Irwin et al. (2016) Improved Identification and Analysis of Small Open Reading Frame Encoded Polypeptides. Anal Chem 88:3967-75
Saghatelian, Alan; Couso, Juan Pablo (2015) Discovery and characterization of smORF-encoded bioactive polypeptides. Nat Chem Biol 11:909-16
Chu, Qian; Ma, Jiao; Saghatelian, Alan (2015) Identification and characterization of sORF-encoded polypeptides. Crit Rev Biochem Mol Biol 50:134-41
Pauli, Andrea; Norris, Megan L; Valen, Eivind et al. (2014) Toddler: an embryonic signal that promotes cell movement via Apelin receptors. Science 343:1248636
Slavoff, Sarah A; Heo, Jinho; Budnik, Bogdan A et al. (2014) A human short open reading frame (sORF)-encoded polypeptide that stimulates DNA end joining. J Biol Chem 289:10950-7
Ma, Jiao; Ward, Carl C; Jungreis, Irwin et al. (2014) Discovery of human sORF-encoded polypeptides (SEPs) in cell lines and tissue. J Proteome Res 13:1757-65

Showing the most recent 10 out of 12 publications