This proposal is entitled Binding-Site Modeling with Multiple-Instance Machine-Learning. One of the most challenging and longest studied problems in computer-aided drug design has been affinity prediction of small molecule ligands for their cognate protein targets. Despite decades of work, quantitative structure-activity re- lationship prediction (QSAR) approaches still suffer from poor accuracy, especially when predicting outside of closely related series of molecules. Even with high-quality structures of target proteins, approaches grounded in physics are also far from robust and accurate enough for reliable use in drug lead optimization. This proposal will build upon a foundation in multiple-instance machine learning applied to computer-aided drug design problems and develop a robust, accurate, and practically applicable affinity prediction methodology. The methodology requires only ligand structures and associated activity data for training, and it induces a virtual protein binding site composed of molecular fragments. The virtual binding pocket (or pocketmol) is used in conjunction with a scoring function developed originally for molecular docking. The pocketmol configuration is chosen such that the optimal conformation and alignment of a ligand (based on the docking scoring function), yields scores for training ligands that are close to the known experimental values. Feasibility has been demon- strated in papers involving both membrane-bound receptors and enzymes. However, multiple challenges remain and are the subject of the proposed research. There are three key issues. First, there exist many pocketmols that satisfy the requirements of fitting the training data, so general solutions must be developed to address the inductive bias of the learning procedure as well as model selection after the procedure. Second, since any particular model is the product of a learning process, it will have some domain of applicability, with some new molecules likely to be predicted well and others poorly. Further, the model will be better informed by learning with certain new molecules but not others. We must develop solutions for estimating confidence of predictions for new molecules as well as for identifying particular molecules that will be highly informative. Third, the operational application of these methods involves model building, guided chemical synthesis, and iterative refinement of models. Convincing validation will require application on temporal series of molecules synthesized for multiple targets of pharmaceutical interest. The proposed work will develop novel methods to address these challenges and will establish extensive validation on multiple pharmaceutically relevant temporal series of small molecules that were the subject of real-world lead-optimization exercises.

Public Health Relevance

The dominant mode of therapeutic discovery involves the design 'me-too' drugs that are very similar in structure and effect to existing drugs. In order to address the unmet medical needs of an aging population, novel therapeutics must be developed, and this will require much more creativity in the design process. The proposed research will develop a predictive computational framework to aid in active design of structurally novel drug molecules during the drug discovery lead optimization process.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Wu, Mary Ann
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of California San Francisco
Schools of Pharmacy
San Francisco
United States
Zip Code
Cleves, Ann E; Jain, Ajay N (2018) Quantitative surface field analysis: learning causal models to predict ligand binding affinity and pose. J Comput Aided Mol Des :
Cleves, Ann E; Jain, Ajay N (2017) ForceGen 3D structure and conformer generation: from small lead-like molecules to macrocyclic drugs. J Comput Aided Mol Des 31:419-439
Cleves, Ann E; Jain, Ajay N (2016) Extrapolative prediction using physically-based QSAR. J Comput Aided Mol Des 30:127-52
Cleves, Ann E; Jain, Ajay N (2015) Chemical and protein structural basis for biological crosstalk between PPAR? and COX enzymes. J Comput Aided Mol Des 29:101-12
Cleves, Ann E; Jain, Ajay N (2015) Knowledge-guided docking: accurate prospective prediction of bound configurations of novel ligands using Surflex-Dock. J Comput Aided Mol Des 29:485-509
Yera, Emmanuel R; Cleves, Ann E; Jain, Ajay N (2014) Prediction of off-target drug effects through data fusion. Pac Symp Biocomput :160-71
Spitzer, Russell; Cleves, Ann E; Varela, Rocco et al. (2014) Protein function annotation by local binding site surface similarity. Proteins 82:679-94
Varela, Rocco; Cleves, Ann E; Spitzer, Russell et al. (2013) A structure-guided approach for protein pocket modeling and affinity prediction. J Comput Aided Mol Des 27:917-34
Varela, Rocco; Walters, W Patrick; Goldman, Brian B et al. (2012) Iterative refinement of a binding pocket model: active computational steering of lead optimization. J Med Chem 55:8926-42
Jain, Ajay N; Cleves, Ann E (2012) Does your model weigh the same as a duck? J Comput Aided Mol Des 26:57-67

Showing the most recent 10 out of 11 publications