The recent explosion of nucleic acid sequencing capacity has given rise to routine high-throughput analysis of transcriptional and genomic variation. The goal of the proposed research is to create a toolkit of molecular reagents to enable similarly multiplexed single molecule sequencing of proteins and peptides. Recent advances in single molecule detection make this a feasible goal. However, in contrast to nucleic acid sequencing, nature has not provided us with suitable enzymes and amino acid-identifying proteins to perform this analysis. Protein design and engineering must be utilized to generate the necessary molecular reagents. The long-range strategy we envision for protein sequencing is to perform Edman degradation on single molecules. Protein engineering will be used to adapt naturally occurring proteins with intrinsic affinity and specificity for free amino acids to serve s sequence-specific binders of N-terminal residues in peptide. Visualization will be performed with single molecule fluorescence microscopy. A cysteine protease will be engineered to remove terminal amino acids to regenerate a new peptide N-terminus for subsequent rounds of sequencing. The ready availability of proteins that recognize post-translationally modified as well as the twenty canonical amino acids suggests that this method can be applied to study the post-translational state, as well as the content, of the proteome.
The specific aims are 1) to engineer tRNA synthetases to serve as N-terminal sequencing reagents, 2) to modify a cysteine protease to remove N-terminal amino acids that have been modified with the Edman reagent, 3) to engineer a set of three proteins to enable the sequencing of phosphorylated amino acids. Preliminary results demonstrate that these aims are feasible. Completion of this research will move next-generation protein sequencing much closer to being a reality.

Public Health Relevance

The complete and quantitative analysis of proteomic inventory is crucial for identifying and measuring biomarkers, those proteins whose levels differ between diseased and unaffected tissues. The ability to identify medical problems as early as possible improves outcomes. The ability to determine phosphorylation state in this analysis can make even finer distinctions between the biologically relevant states of different samples. The molecules we propose to engineer for protein sequencing could enable proteomic analysis that has excellent dynamic range, is inherently quantitative, and is sensitive to amino acid phosphorylation state.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Macromolecular Structure and Function B Study Section (MSFB)
Program Officer
Edmonds, Charles G
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Washington University
Schools of Medicine
Saint Louis
United States
Zip Code
Bodmer, Nicholas K; Havranek, James J (2018) Efficient minimization of multipole electrostatic potentials in torsion space. PLoS One 13:e0195578
Chang, Yiming K; Srivastava, Yogesh; Hu, Caizhen et al. (2017) Quantitative profiling of selective Sox/POU pairing on hundreds of sequences in parallel by Coop-seq. Nucleic Acids Res 45:832-845
Rastogi, Suchita; Borgo, Ben; Pazdernik, Nanette et al. (2015) Caenorhabditis elegans glp-4 Encodes a Valyl Aminoacyl tRNA Synthetase. G3 (Bethesda) 5:2719-28
Sasaki, Yo; Margolin, Zachary; Borgo, Benjamin et al. (2015) Characterization of Leber Congenital Amaurosis-associated NMNAT1 Mutants. J Biol Chem 290:17228-38
Zhang, Chi; Myers, Connie A; Qi, Zongtai et al. (2015) Redesign of the monomer-monomer interface of Cre recombinase yields an obligate heterotetrameric complex. Nucleic Acids Res 43:9076-85
Joyce, Adam P; Zhang, Chi; Bradley, Philip et al. (2015) Structure-based modeling of protein: DNA specificity. Brief Funct Genomics 14:39-49
Borgo, Benjamin; Havranek, James J (2014) Motif-directed redesign of enzyme specificity. Protein Sci 23:312-20
Lyskov, Sergey; Chou, Fang-Chieh; Conchúir, Shane Ó et al. (2013) Serverification of molecular modeling applications: the Rosetta Online Server that Includes Everyone (ROSIE). PLoS One 8:e63906
Leaver-Fay, Andrew; O'Meara, Matthew J; Tyka, Mike et al. (2013) Scientific benchmarks for guiding macromolecular energy function improvement. Methods Enzymol 523:109-43