The overall research objective of this CAREER project is to develop and assess improved computational modeling and design approaches focused on protein-protein interactions, to facilitate engineering of selective molecules to characterize, perform and control biological functions. In Aim 1, a model will be developed that will allow to optimize the fitness of a protein sequence for multiple selective criteria, such as binding to preferred partners while simultaneously avoiding unwanted interactions. In Aim 2, methods to more accurately model changes in protein conformation in response to binding events and designed mutations will be improved and evaluated. These models will be used to predict not only the optimal sequence, but also sets of tolerated sequences that capture the observed sequence, conformational and functional plasticity of protein-protein interactions. This research will build foundational approaches to quantitatively characterize plasticity and selectivity of protein interfaces. A practical outcome of this work will be a computational method to design sequence libraries and provide testable predictions for new proteins that are precise enough to have the desired function within complex biological environments.

Computational methods developed under this project will be employed in teaching activities, integrating research and educational aims to introduce graduate, undergraduate and high-school students to structure-based biological engineering. As part of a new core graduate curriculum in Quantitative Biology, lecture-based, project-planning and hands-on practical course modules in protein biocomputing will be designed to teach a foundational understanding of current capabilities and limitations of modeling, and an appreciation of opportunities resulting from tight integration of theory and experiment. Project planning and practical units will emphasize multidisciplinary approaches allowing students with experimental and theoretical backgrounds to learn from each other. Undergraduate and high-school students will carry out summer projects in protein design and synthetic biology. New and improved computational methods will be disseminated broadly via the Rosetta package of computational tools that is available free-of-charge to academic users.

Project Report

Designed proteins with new biological functions have tremendous potential to advance many areas of science and engineering. They could provide new ways to study fundamental biological processes and help to solve real-world problems. Potential applications include new enzymes and biological synthesis pathways for fuel molecules or compounds that are otherwise too expensive to produce, or robust sensors and signaling systems that can detect specific inputs and generate a precise response. Over the last decade, there have been significant advances in the computer-aided design of novel functional proteins. However, there are also significant challenges impeding more rapid progress in the field. The intellectual merit of this research lies in the development and validation of foundational computational methods targeting two key challenges that must be addressed before the potential of designer proteins can be fully realized and proteins can be designed more accurately. The first challenge lies in engineering proteins to be precise enough to interact with their desired partners while avoiding undesired side effects. This is necessary to design molecules that can function correctly in complex biological environments, such as cells and organisms that contain many possible – desired and undesired – interaction partners. The second key problem lies in the realistic modeling of the detailed structural changes proteins undergo, both when they function normally and in response to the sequence changes we introduce in design. To address these challenges, we made two important methodological advances. The first class of methods improved the standard design approach, which considers only a single target protein structure. Our new computational protocols can now use a set of additional, multiple criteria, such as those defining interaction specificity, improving our ability to distinguish desired interaction partners from those that would lead to unwanted side effects. The second class of foundational methods developed a new model of conformational plasticity, which is a key step addressing a long-standing problem in protein design: the modeling of changes in backbone structure. Using this model, we demonstrated that we can predict sets of sequences ('libraries') that are enriched in functional members observed experimentally. This computational strategy to design libraries is critical to enabling a wide range of applications in industrial and biological engineering. The broader impacts of our work lie in enabling methodologies for protein design that (i) lead to a better understanding of fundamental biological processes by probing and controlling them in new ways; and (ii) help solve important problems for the bio-economy by creating new and useful biological functions. To make our advances broadly accessible, we have disseminated all developed methods via the Rosetta suite of computational tools, which is available free-of-charge for academic use and currently has more than 9,000 licensees. Our methods are used in applications of considerable biological importance, such as the prediction and modulation of protein-protein interactions that are involved in essentially all functions in living cells and organisms. The approaches are also being used to improve the activities and selectivities of protein therapeutics and industrial enzymes. Eighteen undergraduate students, graduate students, and postdoctoral fellows were directly involved in the research, including eight students/postdocs from groups underrepresented in science and engineering. We have also integrated research and education through the development of multiple courses, benefitting several hundred professional and graduate students. Educational activities included lecture-based and project-planning units, as well as hands-on "team challenge" projects in protein design, providing a foundational understanding of the current capabilities and limitations of modeling and encouraging an appreciation for opportunities created by integration of theory and experiment. Project planning and team challenges emphasized multidisciplinary approaches where students with experimental and theoretical backgrounds learn from each other, and many participants and graduates have gone on to combine disciplines in their current work in academia and industry.

National Science Foundation (NSF)
Division of Molecular and Cellular Biosciences (MCB)
Application #
Program Officer
Kamal Shukla
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of California San Francisco
San Francisco
United States
Zip Code