Organisms are defined by the information encoded in their genomes, and since the evolution of life as we know it, this information has been encoded using a four-letter genetic alphabet, made possible by the selective pairing of (d)G with (d)C and (d)A with dT or U. The creation of a third, unnatural base pair (UBP) would have profound implications for our understanding of what life is and how it may have evolved, and could serve as the foundation of a semi-synthetic organism (SSO), a living cell that stores information beyond that of the natural genetic alphabet and retrieves it in the form of proteins containing unnatural amino acids. This has great potential to improve human health, as proteins now constitute an important class of therapeutics, but their utility is currently restricted by the limted physicochemical diversity of the twenty natural amino acids. Since 1999, our NIH-funded work has focused on the development of a UBP. Our strategy is based on the use of hydrophobic and packing forces, as opposed to Watson-Crick-like hydrogen bonds, to optimize UBP formation, as we have found that such forces are strong and disfavor mispairing with the natural, more hydrophilic nucleotides. This effort reached major milestones in 2009 with our discovery of the UBP formed between dNaM and d5SICS, and in 2014 with our engineering of an E. coli-based SSO that stably harbors this UBP in its DNA. In the past year, we have continued to optimize the SSO, including the optimization and genomic integration of the gene encoding the nucleotide triphosphate transporter from Phaeodactylum tricornutum (PtNTT2), which makes import of dNaMTP and d5SICSTP into the cell possible. With the optimized PtNTT2 now under the control of a constitutive promoter, the SSO is more healthy and always competent for unnatural triphosphate uptake. While we have discovered that replication of the UBP proceeds with a sequence bias, we have already made progress towards eliminating the bias by continuing to optimize the UBP, and by introducing an error-correction mechanism mediated by CRISPR-associated protein-9 nuclease (Cas9). We have also demonstrated that DNA containing the UBP may be transcribed within the SSO into RNA containing unnatural nucleotides. Although continued exploration and optimization is still required, the major challenges of creating the first form of life that stably harbors and retrieves information beyond that encoded by the natural genetic alphabet have been identified. These include the optimization of replication to eliminate the observed sequence bias, the optimization of transcription, including the transcription of mRNAs and tRNAs, and lastly, the demonstration of efficient translation, and strategies toward overcoming these challenges are described. If successful, our efforts will yield the first form of life that faithfully stores and retrieves infomation beyond that encoded by the natural genetic alphabet, and will result in a general platform for the production of diverse, therapeutic proteins that could revolutionize medicine.
The four nucleotide 'letters,' which come in 'base pairs' of G-C and A-T/U, form the 64 triplet nucleotide codon 'words' that provide every cell with the instructions for producing the molecules of life, proteins. This project aims to engineer an organism capable of maintaining and utilizing six nucleotide letters that form three base pairs and thus has extra codons to produce proteins with novel components and unprecedented diversity. Such an organism would have many uses for human health, but perhaps be most impactful in the burgeoning field of protein therapeutics.