All successful state-of-the-art protein docking methods employ a so called multistage approach. At the first stage of such approaches a rough energy potential is used to score billions of conformations. At a second stage, thousands of conformations with the best scores are retained and clustered based on a certain similarity metric. Cluster centers correspond to putative predictions/models. Recent work by the proposing team demonstrated that greater prediction quality can be achieved by properly exploring these clusters through a process called refinement. This work resulted in the development of a prototype refinement approach - the Semi-Definite programming-based Underestimation method (SDU). The central goal of the project is to build on the SDU success and develop a new high-throughput refinement protocol able to produce predictions of near-crystallographic quality in the most computationally efficient manner. Efficiency will be achieved by leveraging the funnel-like shape that binding free energy potentials exhibit.
The specific aims are: (1) the development of a new clustering method that can classify the conformations retained from a first-stage method into clusters suitable for the proposed refinement strategy;(2) the characterization of the structure of the multi-dimensional funnel corresponding to each cluster and the development of an efficient refinement strategy to explore this funnel;(3) the development of a side-chain positioning algorithm appropriate for docking by leveraging Markov random field theory;and (4) the dissemination of the algorithms developed through the release to the research community of a software package and an automated refinement server. It is anticipated that the computational efficiency gains of the proposed refinement protocol over alternative Monte Carlo methods will exceed two orders of magnitude, while, at the same time, significantly improve upon the accuracy achieved by earlier refinement approaches. A novelty of the proposed work is in its use of sophisticated machinery from the fields of optimization and decision theory specially tailored to the biophysical properties of the docking problem. Techniques from convex and combinatorial optimization, machine learning, and Markov random fields are brought to bear on the refinement stage of multistage protein docking approaches. An important element of the work is the systematic characterization of multi-dimensional binding energy funnels. The existence of such funnels has been long conjectured but it has not led to new docking approaches so far. The proposed algorithms essentially achieve this goal by devising efficient strategies to identify, characterize, and explore these funnels.

Public Health Relevance

This work will substantially improve upon computational methods for characterizing and predicting protein- protein interactions. It will enable treating relatively weak protein complexes involving larger proteins than what is possible today. This will result in a better understanding of processes such as metabolic control, immune response, signal transduction, and gene regulation.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Macromolecular Structure and Function D Study Section (MSFD)
Program Officer
Wehrle, Janna P
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Boston University
Engineering (All Types)
Schools of Engineering
United States
Zip Code
Dai, Wuyang; Brisimi, Theodora S; Adams, William G et al. (2015) Prediction of hospitalization due to heart diseases by supervised learning methods. Int J Med Inform 84:189-97
Bohnuud, Tanggis; Kozakov, Dima; Vajda, Sandor (2014) Evidence of conformational selection driving the formation of ligand binding sites in protein-protein interfaces. PLoS Comput Biol 10:e1003872
Yakubovskaya, Elena; Guja, Kip E; Eng, Edward T et al. (2014) Organization of the human mitochondrial transcription initiation complex. Nucleic Acids Res 42:4100-12
Mottarella, Scott E; Beglov, Dmitri; Beglova, Natalia et al. (2014) Docking server for the identification of heparin binding sites on proteins. J Chem Inf Model 54:2068-78
Bogorad, Andrew M; Xia, Bing; Sandor, Dana G et al. (2014) Insights into the architecture of the eIF2B?/?/? regulatory subcomplex. Biochemistry 53:3432-45
Vajda, Sandor; Hall, David R; Kozakov, Dima (2013) Sampling and scoring: a marriage made in heaven. Proteins 81:1874-84
Golden, Mary S; Cote, Shaun M; Sayeg, Marianna et al. (2013) Comprehensive experimental and computational analysis of binding energy hot spots at the NF-ýýB essential modulator/IKKýý protein-protein interface. J Am Chem Soc 135:6242-56
Lavi, Assaf; Ngan, Chi Ho; Movshovitz-Attias, Dana et al. (2013) Detection of peptide-binding sites on protein surfaces: the first step toward the modeling and targeting of peptide-mediated interactions. Proteins 81:2096-105
Kozakov, Dima; Hall, David R; Chuang, Gwo-Yu et al. (2011) Structural conservation of druggable hot spots in protein-protein interfaces. Proc Natl Acad Sci U S A 108:13528-33
Kozakov, Dima; Hall, David R; Beglov, Dmitri et al. (2010) Achieving reliability and high accuracy in automated protein docking: ClusPro, PIPER, SDU, and stability analysis in CAPRI rounds 13-19. Proteins 78:3124-30