Our long-term objective is to turn computational protein design into a disruptive technology platform that will enable the routine and rapid generation of reagents for detecting proteins or perturbing their functions. Currently, research and therapy rely on small molecules and/or antibodies for these tasks. These are powerful tools, but they can be slow and expensive to develop, and they do not meet all needs. Designer peptides or mini-proteins have high potential to bind extracellular or intracellular targets either as labels (e.g., for imaging) or as functional modulators (e.g., interaction inhibitors), for applications in basic and clinical research and in disease diagnosis and treatment. Existing tools for designing such custom proteins rely on experimental library screening, sometimes guided or supported by computational modeling of structure. Despite the immense value such molecules would bring to basic biomedical research and therapeutic development, there are not yet rapid and facile routes to obtaining designed proteins with desired properties. Computational methods can potentially address this need, but existing technology is not sufficiently reliable, flexible or automated for routine use. Compared to the mid-1990?s, when the modern approach to computational protein design was developed, we live in a data and technology-empowered age. The premise of this proposal is that we can increase the range of problems that can be solved using computational design, and also dramatically improve success rates, by making full use of the proven rules of sequence-structure compatibility encoded in known natural structures and their homologous sequences. The Protein Data Bank (the collection of all known protein structures) has grown 10-fold since 2000, placing us at a point where we can design novel proteins by constructing them from building blocks used in nature. We have implemented a new design framework that is based on this principle and that is different in fundamental aspects from all previously published alternatives. Tests on diverse tasks demonstrate outstanding success. To further develop our approach, we propose methodological advances that we will implement, test and then apply to protein design challenges involving detecting or inhibiting protein recognition domains. We will develop and apply methods to: automatically identify design strategies for binding to a target protein, score and rank specific design candidates, design libraries that will be screened to provide rich experimental data about successes and failures, and automatically feed experimental data back into model development in a principled way. Outcomes will include new methodology that will be shared with the community, computational predictions of high-ranked interface design sites that can inform analysis of structures and pathways, and experimentally validated designer molecules that bind to protein domains important for signaling in disease pathways.

Public Health Relevance

Scientists and physicians need molecules that can selectively bind proteins for use in research, disease diagnosis, and therapy. This proposal describes a new approach that combines computational and experimental methods and exploits diverse sources of data to accelerate the discovery of designed protein-interaction agents.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Macromolecular Structure and Function D Study Section (MSFD)
Program Officer
Lyster, Peter
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Dartmouth College
Biostatistics & Other Math Sci
Graduate Schools
United States
Zip Code