Organisms are defined by the information encoded in their genomes, and since the evolution of life as we know it, this information has been encoded using a four-letter genetic alphabet, made possible by the selective pairing of (d)G with (d)C and (d)A with dT or U. The creation of a third, unnatural base pair (UBP) would have profound implications for our understanding of what life is and how it may have evolved, and could serve as the foundation of a semi-synthetic organism (SSO), a living cell that stores information beyond that of the natural genetic alphabet and retrieves it in the form of proteins containing unnatural amino acids. This has great potential to improve human health, as proteins now constitute an important class of therapeutics, but their utility is currently restricted by the limted physicochemical diversity of the twenty natural amino acids. Since 1999, our NIH-funded work has focused on the development of a UBP. Our strategy is based on the use of hydrophobic and packing forces, as opposed to Watson-Crick-like hydrogen bonds, to optimize UBP formation, as we have found that such forces are strong and disfavor mispairing with the natural, more hydrophilic nucleotides. This effort reached major milestones in 2009 with our discovery of the UBP formed between dNaM and d5SICS, and in 2014 with our engineering of an E. coli-based SSO that stably harbors this UBP in its DNA. In the past year, we have continued to optimize the SSO, including the optimization and genomic integration of the gene encoding the nucleotide triphosphate transporter from Phaeodactylum tricornutum (PtNTT2), which makes import of dNaMTP and d5SICSTP into the cell possible. With the optimized PtNTT2 now under the control of a constitutive promoter, the SSO is more healthy and always competent for unnatural triphosphate uptake. While we have discovered that replication of the UBP proceeds with a sequence bias, we have already made progress towards eliminating the bias by continuing to optimize the UBP, and by introducing an error-correction mechanism mediated by CRISPR-associated protein-9 nuclease (Cas9). We have also demonstrated that DNA containing the UBP may be transcribed within the SSO into RNA containing unnatural nucleotides. Although continued exploration and optimization is still required, the major challenges of creating the first form of life that stably harbors and retrieves information beyond that encoded by the natural genetic alphabet have been identified. These include the optimization of replication to eliminate the observed sequence bias, the optimization of transcription, including the transcription of mRNAs and tRNAs, and lastly, the demonstration of efficient translation, and strategies toward overcoming these challenges are described. If successful, our efforts will yield the first form of life that faithfully stores and retrieves infomation beyond that encoded by the natural genetic alphabet, and will result in a general platform for the production of diverse, therapeutic proteins that could revolutionize medicine.

Public Health Relevance

The four nucleotide 'letters,' which come in 'base pairs' of G-C and A-T/U, form the 64 triplet nucleotide codon 'words' that provide every cell with the instructions for producing the molecules of life, proteins. This project aims to engineer an organism capable of maintaining and utilizing six nucleotide letters that form three base pairs and thus has extra codons to produce proteins with novel components and unprecedented diversity. Such an organism would have many uses for human health, but perhaps be most impactful in the burgeoning field of protein therapeutics.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Unknown (R35)
Project #
Application #
Study Section
Special Emphasis Panel (ZGM1)
Program Officer
Fabian, Miles
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Scripps Research Institute
La Jolla
United States
Zip Code
Feldman, Aaron W; Romesberg, Floyd E (2018) Expansion of the Genetic Alphabet: A Chemist's Approach to Synthetic Biology. Acc Chem Res 51:394-403
Feldman, Aaron W; Fischer, Emil C; Ledbetter, Michael P et al. (2018) A Tool for the Import of Natural and Unnatural Nucleoside Triphosphates into Bacteria. J Am Chem Soc 140:1447-1454
Dien, Vivian T; Holcomb, Matthew; Feldman, Aaron W et al. (2018) Progress Toward a Semi-Synthetic Organism with an Unrestricted Expanded Genetic Alphabet. J Am Chem Soc 140:16115-16123
Ledbetter, Michael P; Karadeema, Rebekah J; Romesberg, Floyd E (2018) Reprograming the Replisome of a Semisynthetic Organism for the Expansion of the Genetic Alphabet. J Am Chem Soc 140:758-765
Zhang, Yorke; Ptacin, Jerod L; Fischer, Emil C et al. (2017) A semi-synthetic organism that stores and retrieves increased genetic information. Nature 551:644-647
Morris, Sydney E; Feldman, Aaron W; Romesberg, Floyd E (2017) Synthetic Biology Parts for the Storage of Increased Genetic Information in Cells. ACS Synth Biol 6:1834-1840
Feldman, Aaron W; Dien, Vivian T; Romesberg, Floyd E (2017) Chemical Stabilization of Unnatural Nucleotide Triphosphates for the in Vivo Expansion of the Genetic Alphabet. J Am Chem Soc 139:2464-2467
Feldman, Aaron W; Romesberg, Floyd E (2017) In Vivo Structure-Activity Relationships and Optimization of an Unnatural Base Pair for Replication in a Semi-Synthetic Organism. J Am Chem Soc 139:11427-11433
Zhang, Yorke; Lamb, Brian M; Feldman, Aaron W et al. (2017) A semisynthetic organism engineered for the stable expansion of the genetic alphabet. Proc Natl Acad Sci U S A 114:1317-1322