Machine learning has broad applicability in the fields of chemistry and biology. This research effort is focused on empirical derivation of functions that are useful in the context of predicting aspects of molecular interaction between proteins and ligands. The characteristics of this problem offer unique challenges when approached from the perspective of machine learning, key among them being that the configuration in which molecules interact is not generally known. In the case of small molecule protein interactions, where it is possible to represent molecules as 3D objects, this is manifested in terms of hidden variables in the relative conformation and alignment of protein and ligand. Most machine learning tasks do not embed hidden variables in this fashion, but the problem is not insurmountable. We have implemented a number of methods which demonstrate that the problem of hidden variables is tractable, both methodologically in model induction and scoring function optimization as well as from the perspective of computational complexity in search. In this work, we will develop novel methods and refine existing methods in 3 problem areas: 1) Developing scoring functions for small molecule protein interactions with a known protein structure (the docking problem); 2) Developing quantitative models of small molecule activity against proteins with no known structure (the 3D QSAR problem); and 3) Developing methods for search and optimization that improve both model and scoring function induction and high-throughput application to large libraries of small molecules. The goal is to address the problem of prediction in a quantifiable way, which will allow both practical improvements in applications of the methods, and will also provide insight into the mechanistic aspects of the underlying physical molecular interactions. All methods and data will be made widely available to both academic and industrial investigators.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1-BDMA (01))
Program Officer
Wehrle, Janna P
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of California San Francisco
Internal Medicine/Medicine
Schools of Medicine
San Francisco
United States
Zip Code
Cleves, Ann E; Jain, Ajay N (2015) Chemical and protein structural basis for biological crosstalk between PPAR? and COX enzymes. J Comput Aided Mol Des 29:101-12
Cleves, Ann E; Jain, Ajay N (2015) Knowledge-guided docking: accurate prospective prediction of bound configurations of novel ligands using Surflex-Dock. J Comput Aided Mol Des 29:485-509
Yera, Emmanuel R; Cleves, Ann E; Jain, Ajay N (2014) Prediction of off-target drug effects through data fusion. Pac Symp Biocomput :160-71
Spitzer, Russell; Cleves, Ann E; Varela, Rocco et al. (2014) Protein function annotation by local binding site surface similarity. Proteins 82:679-94
Varela, Rocco; Cleves, Ann E; Spitzer, Russell et al. (2013) A structure-guided approach for protein pocket modeling and affinity prediction. J Comput Aided Mol Des 27:917-34
Spitzer, Russell; Jain, Ajay N (2012) Surflex-Dock: Docking benchmarks and real-world application. J Comput Aided Mol Des 26:687-99
Varela, Rocco; Walters, W Patrick; Goldman, Brian B et al. (2012) Iterative refinement of a binding pocket model: active computational steering of lead optimization. J Med Chem 55:8926-42
Jain, Ajay N; Cleves, Ann E (2012) Does your model weigh the same as a duck? J Comput Aided Mol Des 26:57-67
Yera, Emmanuel R; Cleves, Ann E; Jain, Ajay N (2011) Chemical structural novelty: on-targets and off-targets. J Med Chem 54:6771-85
Spitzer, Russell; Cleves, Ann E; Jain, Ajay N (2011) Surface-based protein binding pocket similarity. Proteins 79:2746-63

Showing the most recent 10 out of 25 publications