With rare exceptions, proteins in all domains of life are biosynthesized using the same twenty canonical amino acid building blocks. However, the chemical and functional space accessible to proteins are greatly expanded in living systems by a wide variety of different post-translational modifications (PTMs). These PTMs play important roles in all aspects of our biology. In recent years, the catalog of known PTMs within our proteome have expanded at a furious pace, thanks to advances in mass-spectrometry based proteomics and related technologies. However, functional consequences of the overwhelming majority of these newly identified PTMs remain poorly characterized. At the core of this deep knowledge-gap on a critically important facet of our biology lies the difficulty of producing eukaryotic proteins in a homogeneous state of modification for probing how their properties are modulated by a PTM in vitro or in vivo. For most PTMs identified through MS-proteomics, the exact biochemical origin is either unknown or challenging to reconstitute without additional pleiotropic consequences. Genetic code expansion (GCE) technology provides an exciting solution for this problem by enabling co-translational site-specific incorporation of a modified residue into virtually any site of any protein. However, despite its enormous potential, the scope of this technology in eukaryotic systems remains limited by several technical challenges, including the restricted structural diversity of noncanonical amino acids (ncAAs) that can be genetically encoded, poor efficiency of their incorporation, etc. Over the last five years, our group has greatly expanded the scope of this technology by developing innovative solutions to overcome these longstanding challenges, including: A) new platforms for genetically encoding previously inaccessible ncAAs, B) a mammalian cell-based directed evolution system to improve the performance of this machinery, and C) novel viral vectors that efficiently deliver the ncAA incorporation machinery to wide variety of mammalian cells and tissues. These advances have opened the exciting opportunity to use this powerful technology to systematically decipher the role of various PTMs observed in the human proteome. To this end, in the next five years, we propose to develop new GCE platforms to access new structural classes of ncAAs, use them to genetically encode previously inaccessible PTMs in eukaryotes, optimize their efficiency through directed evolution, and use them to decipher the consequences of PTMs. Furthermore, we will develop technology to systematically explore new protein-protein interactions triggered by PTMs (e.g., with reader/eraser proteins), by site-specifically incorporating two ncAAs: one modeling the PTM of interest and another harboring a photo-crosslinker. Finally, by overcoming longstanding challenges, we will dramatically advance the scope of the GCE technology for application in mammalian cells, which will have broad and deep impact far beyond the scope of this proposal.

Public Health Relevance

We propose to develop new platforms to genetically encode many previously inaccessible noncanonical amino acids to model novel protein post-translational modifications in eukaryotic cells. These optimized tools would be used to systematically probe how post-translational modifications modulate the biology of eukaryotic cells.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Unknown (R35)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Fabian, Miles
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Boston College
Schools of Arts and Sciences
Chestnut Hill
United States
Zip Code