Since the beginning of -omics era, reference protein databases have defined the list of proteins are believed to be expressed in the cells. However, most proteins in the databases have been constructed by a conceptual translation of mRNAs based on a limited set of predictions rules such as that a protein translation starts at AUG that can make the longest open reading frame (ORF). Recent mass spectrometry-based proteomics and ribosome profiling studies discovered that protein translation can start at non-AUG codons, noncanonical translation initiation sites (TISs), generating small ORFs or N-terminal extension, collectively called cryptic ORFs. Strikingly, a growing number of studies have started reporting that cryptic ORFs are involved in pathogeneses of many diseases including Alzheimer?s disease (AD) that is the most prevalent neurodegenerative disease. In AD, one of the translation initiation factors, eIF2?, is inactivated by a phosphorylation on it. Consequently, eIF2A replaces the role of eIF2? switching general translation to gene- specific translation. Because eIF2A is less strict in using AUG at a TIS, when eIF2A is involved in the translation initiation, the usage of non-AUG codon is increased expressing more cryptic ORFs. Therefore, the discovery of cryptic ORFs differentially expressed in AD would have great importance in deeper understanding the AD pathogenesis mechanism. Moreover, this established method can be applicable to other neurological diseases as well. Nevertheless, the detection of those proteins has been missed due to the incomplete reference databases and technical limitations. Since ~90% of human proteins have N-terminal acetylation, this can serve as a signature modification for TISs and an in-depth identification of N-terminal acetylated peptides enable us to achieve in-depth identification of cryptic ORFs. Our group already identified over 120 cryptic ORFs from human samples using an N-terminal peptide enrichment technology, but even deeper proteome analysis for cryptic ORFs is required to cover most of the differentially expressed cryptic ORFs in AD brains. To achieve these goals, we propose three aims in this proposal.
In Aim 1, we will develop a new method for the identification of N-terminal acetylated peptides by combining two different methods that were originally developed for the identification of proteolytic cleavage sites, the TAILS method and the subtiligase-based method.
In Aim 2, we will determine whether acetylated peptides mapping to non-cognate codons are bona fide TISs or post-cleavage acetylated peptides using a nascent protein enrichment method.
In Aim 3, we will discovery cryptic ORFs differentially expressed in AD brain using the method developed in Aim 1 as well as an in-depth total proteome analysis strategy. This project with expand and optimize our prior work on cryptic TISs, and lead to the discovery of cryptic ORFs expressed in AD brains. We hypothesize those novel cryptic ORFs are involved in the pathogenesis of AD and represent new therapeutic targets. Moreover, these approaches can be applied to the study of cryptic ORFs in other neurological diseases.
Cryptic open reading frame (ORF) is a protein translated from an alternative ORF of a mRNA in addition to a main ORF. Mostly their translations start at noncanonical translation initiation sites generating small ORFs or N-terminal extension forms of the main ORFs. There have been several studies that have reported cryptic ORFs are involved in disease pathogenesis especially in Alzheimer?s disease (AD). Therefore, understanding what and how cryptic ORFs are involved in AD pathogenesis will provide invaluable information in understanding AD pathogenesis. Nevertheless, in-depth identification of cryptic ORFs expressed in AD has never been conducted yet. We propose to develop a method for in-depth identification cryptic ORFs and discover cryptic ORFs differentially expressed in AD brain. This study will lead us to the elucidation on the involvement of cryptic ORFs not only in AD but also other neurological diseases.