Short peptides (10-100aa) are important regulators of physiology, development and metabolism, however their detection is difficult due to size and abundance. A stunning 30% of annotated human smORF genes include disease-associated variants mapped within exons, compared to 15% of human genes in general. Further, many smORFs are conserved across the entire metazoan phylogeny from invertebrates to vertebrates including man. These ultra-conserved functional smORF genes we call the Conserved smORF Catalog or CSC. These genes have been conserved across more than 500myr of evolution, and yet we know almost nothing at all about their functions. Due to a century of genetic analysis, the genome of the model organism Drosophila melanogaster has the most complete functional annotation among metazoans. Functional annotations derived from Drosophila have been instrumental in hypothesis-based drug development for more than thirty years, and more recently have made possible the biological interpretation of hundreds of SNPs detected in genome-wide association studies (GWAS). Hence, functional annotations derived in fly for conserved genes are transferable to human and are of direct clinical relevance. Remarkably, less than 10% of smORFs in Drosophila have been studied functionally, or experimentally verified as generating peptides. A combination of genome engineering, computational, molecular, and functional studies will be used to systematically and comprehensively characterize the CSC, representing the first genome-scale characterization of smORFs in any organism providing a wealth of information on the biological functions of this poorly studied class of proteins. In total, we will characterize and functionally annotate ~400 conserved smORFs using CRISPR knockout followed by phenotyping and rescue assays. We will assess the phenotypes of the mutants, measuring viability, morphology, fecundity and fertility, lifespan, metabolism (sugar and lipid levels), and a number of behavioral phenotypes. For smORFs with robust phenotypes, we will then attempt to rescue a subset of these mutants in three ways: first, by inserting the whole deleted RNA; second, with a version of the RNA with the smORF(s) removed by the addition a stop codon; and lastly, using a micro- construct containing only the smORF and the endogenous promoter. We will generate direct evidence for translation using tagged expression analysis and targeted MS/MS to scan for predicted polypeptides in the whole embryo and tissue dissection samples. In addition to validating the existence of the predicted molecules, this dataset will provide a foundational gold standard for further development of tools for the computational prediction of functional micropeptides. These studies are directed toward the understanding of basic life processes and lay the foundation for promoting better human health.

Public Health Relevance

As a public resource, our studies will combine genome-scale phenotyping with detailed functional characterization that will assess the effects of evolutionary conserved small open reading frames (smORFs) on animal viability, development, fecundity, metabolism, longevity and behavior. We will apply state-of-the art methods in Ribosomal profiling, CRISPR genome engineering and targeted mass spectrometry together with the development of new computational tools and analyses to generate a foundational gold standard dataset for the study of smORFs and the prediction of functional smORFs in genome annotation. Many of the genes encoding these molecules have been found to play important roles in human diseases such as neurodegeneration, developmental disorders and cancer.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
5R01HG009352-02
Application #
9548692
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Feingold, Elise A
Project Start
2017-09-01
Project End
2020-06-30
Budget Start
2018-07-01
Budget End
2019-06-30
Support Year
2
Fiscal Year
2018
Total Cost
Indirect Cost
Name
Lawrence Berkeley National Laboratory
Department
Type
DUNS #
078576738
City
Berkeley
State
CA
Country
United States
Zip Code
94720