A proteome can be defined as the entire collection of proteins in an organism. Thus, a proteome can be viewed as the complete set of molecular machines necessary to sustain a living organism. Because many different biochemical functions are required in any particular cell, the proteome of any single organism must include a wide range of proteins with diverse amino acid sequences, three-dimensional structures, and biochemical activities. Nonetheless, the full collection of all proteins in all proteomes that ever existed on earth constitutes a minuscule fraction of the sequences that are theoretically possible. Thus, despite billions of years of evolutionary sampling, the vast majority of sequence space remains unexplored. However, recent advances in synthetic biology, combinatorial methods, and protein design have made it possible to begin exploring sequences that have never been exposed to evolution. The proposed research aims to design and produce large collections of novel proteins that fold into well-defined 3D structures, and function in biologically relevant reactions. Completion of this work will represent a significant advance toward developing artificial proteomes that were not evolved by nature, but nonetheless support the growth of living organisms. The project will have broader significance in basic science and for applied technologies: Richard Feynman said, "What I cannot create, I do not understand." Thus, the creation of novel proteins will both test our knowledge, and enhance our understanding of protein biochemistry, biophysics, and molecular evolution. Design and construction of artificial proteomes will also impact applied science and biotechnology: Current biotechnology relies on protein sequences borrowed from nature, while future applications will benefit from de novo sequences that were not selected by nature, but are well-suited for industrial applications. This project provides excellent interdisciplinary research training opportunities. The investigator will present this work also to the public and discuss it in the classroom.

Previous studies of proteins and proteomes were limited to sequences isolated (or modified) from natural systems. The proposed research will overcome this limitation by making available libraries of millions of novel proteins. Such collections will be 1000-fold larger than typical bacterial proteomes. In contrast to studies of natural proteomes, which reveal what was selected by nature, newly enabled studies of artificial proteomes will broaden our understanding beyond what evolved in nature, and will enable experiments that probe sequences, structures, and functions, which are not observed in natural biological systems, but nonetheless can occur in the realm of novel or synthetic biologies. The research will harness both combinatorial/ experimental and computational/theoretical approaches to pursue the following aims: -Design and construction of large collections of novel alpha-helical proteins. -Design and construction of large collections of novel beta-sheet proteins. Proteomes, whether natural or artificial, must contain both alpha and beta structures. -Development and implementation of a high throughput screen for folded structures. The quality of protein libraries will be enhanced by subjecting collections of computationally designed proteins to follow-up screens for structures that are soluble and stably folded. -Determination of 3-dimensional structures and stabilities of proteins from the novel proteome. Successful designs will produce sequences that fold into expected structures. -Isolation and evolution of novel proteins that are active in vitro and functional in vivo. Most importantly, collections of novel sequences will resemble proteomes if and only if they include proteins that are biochemically active and provide essential cellular functions.

National Science Foundation (NSF)
Division of Molecular and Cellular Biosciences (MCB)
Application #
Program Officer
Susanne von Bodman
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Princeton University
United States
Zip Code