This application describes a novel approach for the discovery of non-annotated short open reading frame encoded peptides and small proteins (SEPs), a unique class of understudied peptides in the human genome. Application of this approach to a human leukemia cell line revealed the existence of 32 novel human SEPs, the largest number ever reported. Since SEPs are produced from short open reading frames (sORFs) in the genome this discovery also represents the characterization of 32 new human genes. Analysis of the SEP producing sORFs revealed a number of interesting features about mammalian genes,such as the existence of polycistronic genes, the use of non-ATG start codons to produce protein, and the discovery that some 'non- coding RNAs' have been mistakenly assigned because they actually encode peptides. Likewise, some of the SEPs have features typically found in proteins, such as the ability to localize to specific subcellular compartments and partake in protein-protein interactions, which indicates that they may serve functional roles in the cell. One of these newly discovered SEPs, for instance, partners in a specific protein-protein interaction with a known regulator of cancer cell proliferation to suggesta potential function for this SEP in cell growth.The discovery of these SEPs are significant because they indicate that genome and proteome are larger than previously anticipated and demonstrate the need for additional investigation of these unique human genes. The goals of this application are to discover, characterize, and explore the biology, including any role in disease, of SEPs.

Public Health Relevance

This application details the analysis of a leukemia cell line using a novel approach that led to the discovery of a new group of human genes that encode peptides. This is a significant finding because it indicates human genome and proteome are larger than previously appreciated and may contain non-annotated genes that have important functions. In this application we endeavor to discover; validate; and functionally characterize these novel human genes including their roles in disease.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Enabling Bioanalytical and Imaging Technologies Study Section (EBIT)
Program Officer
Edmonds, Charles G
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Salk Institute for Biological Studies
La Jolla
United States
Zip Code
D'Lima, Nadia G; Ma, Jiao; Winkler, Lauren et al. (2017) A human microprotein that interacts with the mRNA decapping complex. Nat Chem Biol 13:174-180
Saghatelian, Alan (2017) Novel Biology and Druggable Targets via Chemoproteomics. Biochemistry 56:6515-6516
Arnoult, Nausica; Correia, Adriana; Ma, Jiao et al. (2017) Regulation of DNA repair pathway choice in S and G2 phases by the NHEJ inhibitor CYREN. Nature 549:548-552
Chu, Qian; Rathore, Annie; Diedrich, Jolene K et al. (2017) Identification of Microprotein-Protein Interactions via APEX Tagging. Biochemistry 56:3299-3306
Ma, Jiao; Diedrich, Jolene K; Jungreis, Irwin et al. (2016) Improved Identification and Analysis of Small Open Reading Frame Encoded Polypeptides. Anal Chem 88:3967-75
Saghatelian, Alan; Couso, Juan Pablo (2015) Discovery and characterization of smORF-encoded bioactive polypeptides. Nat Chem Biol 11:909-16
Chu, Qian; Ma, Jiao; Saghatelian, Alan (2015) Identification and characterization of sORF-encoded polypeptides. Crit Rev Biochem Mol Biol 50:134-41
Slavoff, Sarah A; Heo, Jinho; Budnik, Bogdan A et al. (2014) A human short open reading frame (sORF)-encoded polypeptide that stimulates DNA end joining. J Biol Chem 289:10950-7
Ma, Jiao; Ward, Carl C; Jungreis, Irwin et al. (2014) Discovery of human sORF-encoded polypeptides (SEPs) in cell lines and tissue. J Proteome Res 13:1757-65
Pauli, Andrea; Norris, Megan L; Valen, Eivind et al. (2014) Toddler: an embryonic signal that promotes cell movement via Apelin receptors. Science 343:1248636

Showing the most recent 10 out of 12 publications