All living organisms on earth use at least 20 amino acids as basic building blocks to make proteins. It is not known, however, whether these building blocks are all essential or whether they can be further reduced. Answering this question will help shed fundamental insights regarding the origin of all organic life on the planet and help us to further understand the organizing rules of life. The goal of this research project is to apply cutting-edge methods in gene and genome synthesis to assess whether life can exist with a reduced number of amino acids. Ultimately, success in this research endeavor will lead to new capabilities in protein design and testing, and provide information regarding the fundamental rules of life. The educational outreach aspect of this work will focus on synthetic microbiology and microbiome topics to provide an important and concrete venue to further increase science literacy, interest, and dialog in synthetic biology, genomics, and biotechnology in the general public.

Recent advances in synthetic biology have enabled the total chemical synthesis of viral, bacterial, and eukaryotic genomes. New design principles have been applied to these synthetic genomes to test with great success our understanding of the fundamental rules governing life including the genetic code, genomic architecture, and genome minimization. A universal property of all known life is the use of the canonical 20 amino acids throughout the cell life cycle from growth to replication and cell division. No free-living organism has been found to date that uses an amino acid alphabet composed of fewer than 20 amino acids. The overall aim of this project is to explore whether every one of the 20 canonical amino acids are essential for life or whether some are dispensable and can be removed at a genome-scale. We will computationally model protein sequence-function and perform proteome-wide residue reassignments to build recoded proteins that utilizes only 19 or fewer amino acids in its alphabet. Through development of new computational protein models and algorithms, including semi-supervised and deep-learning approaches, we will generate new protein variants and experimentally test them in high-throughput using next-generation gene synthesis and multiplex evaluation strategies in living bacteria. Finally, we will attempt to build a partial bacterial genome with these residue recoding principles genome-wide in a piecewise fashion and assess feasibility of amino acid minimization to advance retro-synthetic biology. This effort seeks to make large-scale coding changes to proteins to a degree that has never been attempted in the past and will lead to the development of new foundational computational, experimental, and synthetic biology and genomics tools relevant for the field for the next decade.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

National Science Foundation (NSF)
Division of Molecular and Cellular Biosciences (MCB)
Application #
Program Officer
Anthony Garza
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Columbia University
New York
United States
Zip Code