Binding-Site Modeling with Multiple-Instance Machine-Learning

Jain, Ajay

Abstract

This proposal is entitled Binding-Site Modeling with Multiple-Instance Machine-Learning. One of the most challenging and longest studied problems in computer-aided drug design has been affinity prediction of small molecule ligands for their cognate protein targets. Despite decades of work, quantitative structure-activity re- lationship prediction (QSAR) approaches still suffer from poor accuracy, especially when predicting outside of closely related series of molecules. Even with high-quality structures of target proteins, approaches grounded in physics are also far from robust and accurate enough for reliable use in drug lead optimization. This proposal will build upon a foundation in multiple-instance machine learning applied to computer-aided drug design problems and develop a robust, accurate, and practically applicable affinity prediction methodology. The methodology requires only ligand structures and associated activity data for training, and it induces a virtual protein binding site composed of molecular fragments. The virtual binding pocket (or pocketmol) is used in conjunction with a scoring function developed originally for molecular docking. The pocketmol configuration is chosen such that the optimal conformation and alignment of a ligand (based on the docking scoring function), yields scores for training ligands that are close to the known experimental values. Feasibility has been demon- strated in papers involving both membrane-bound receptors and enzymes. However, multiple challenges remain and are the subject of the proposed research. There are three key issues. First, there exist many pocketmols that satisfy the requirements of fitting the training data, so general solutions must be developed to address the inductive bias of the learning procedure as well as model selection after the procedure. Second, since any particular model is the product of a learning process, it will have some domain of applicability, with some new molecules likely to be predicted well and others poorly. Further, the model will be better informed by learning with certain new molecules but not others. We must develop solutions for estimating confidence of predictions for new molecules as well as for identifying particular molecules that will be highly informative. Third, the operational application of these methods involves model building, guided chemical synthesis, and iterative refinement of models. Convincing validation will require application on temporal series of molecules synthesized for multiple targets of pharmaceutical interest. The proposed work will develop novel methods to address these challenges and will establish extensive validation on multiple pharmaceutically relevant temporal series of small molecules that were the subject of real-world lead-optimization exercises.

Public Health Relevance

The dominant mode of therapeutic discovery involves the design 'me-too' drugs that are very similar in structure and effect to existing drugs. In order to address the unmet medical needs of an aging population, novel therapeutics must be developed, and this will require much more creativity in the design process. The proposed research will develop a predictive computational framework to aid in active design of structurally novel drug molecules during the drug discovery lead optimization process.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of General Medical Sciences (NIGMS)
Type: Research Project (R01)
Project #: 4R01GM101689-04
Application #: 8987578
Study Section: Biodata Management and Analysis Study Section (BDMA)
Program Officer: Wu, Mary Ann

Project Start: 2013-01-01
Project End: 2016-12-31
Budget Start: 2016-01-01
Budget End: 2016-12-31
Support Year: 4
Fiscal Year: 2016
Total Cost: $271,035
Indirect Cost: $100,035

Institution

Name: University of California San Francisco
Department: Pharmacology
Type: Schools of Pharmacy
DUNS #: 094878337

City: San Francisco
State: CA
Country: United States
Zip Code: 94118

Related projects


NIH 2020 R01 GM	Binding-Site Modeling with Multiple-Instance Machine-Learning Jain, Ajay N. / University of California San Francisco
NIH 2019 R01 GM	Binding-Site Modeling with Multiple-Instance Machine-Learning Jain, Ajay N. / University of California San Francisco
NIH 2018 R01 GM	Binding-Site Modeling with Multiple-Instance Machine-Learning Jain, Ajay N. / University of California San Francisco
NIH 2017 R01 GM	Binding-Site Modeling with Multiple-Instance Machine-Learning Jain, Ajay N. / University of California San Francisco
NIH 2016 R01 GM	Binding-Site Modeling with Multiple-Instance Machine-Learning Jain, Ajay N. / University of California San Francisco	$271,035
NIH 2015 R01 GM	Binding-Site Modeling with Multiple-Instance Machine-Learning Jain, Ajay N. / University of California San Francisco	$270,607
NIH 2014 R01 GM	Binding-Site Modeling with Multiple-Instance Machine-Learning Jain, Ajay N. / University of California San Francisco	$269,325
NIH 2013 R01 GM	Binding-Site Modeling with Multiple-Instance Machine-Learning Jain, Ajay N. / University of California San Francisco	$291,402

Publications

Cleves, Ann E; Jain, Ajay N (2018) Quantitative surface field analysis: learning causal models to predict ligand binding affinity and pose. J Comput Aided Mol Des :

Cleves, Ann E; Jain, Ajay N (2017) ForceGen 3D structure and conformer generation: from small lead-like molecules to macrocyclic drugs. J Comput Aided Mol Des 31:419-439

Cleves, Ann E; Jain, Ajay N (2016) Extrapolative prediction using physically-based QSAR. J Comput Aided Mol Des 30:127-52

Cleves, Ann E; Jain, Ajay N (2015) Chemical and protein structural basis for biological crosstalk between PPAR? and COX enzymes. J Comput Aided Mol Des 29:101-12

Cleves, Ann E; Jain, Ajay N (2015) Knowledge-guided docking: accurate prospective prediction of bound configurations of novel ligands using Surflex-Dock. J Comput Aided Mol Des 29:485-509

Yera, Emmanuel R; Cleves, Ann E; Jain, Ajay N (2014) Prediction of off-target drug effects through data fusion. Pac Symp Biocomput :160-71

Spitzer, Russell; Cleves, Ann E; Varela, Rocco et al. (2014) Protein function annotation by local binding site surface similarity. Proteins 82:679-94

Varela, Rocco; Cleves, Ann E; Spitzer, Russell et al. (2013) A structure-guided approach for protein pocket modeling and affinity prediction. J Comput Aided Mol Des 27:917-34

Varela, Rocco; Walters, W Patrick; Goldman, Brian B et al. (2012) Iterative refinement of a binding pocket model: active computational steering of lead optimization. J Med Chem 55:8926-42

Jain, Ajay N; Cleves, Ann E (2012) Does your model weigh the same as a duck? J Comput Aided Mol Des 26:57-67

Showing the most recent 10 out of 11 publications

Comments

Be the first to comment on this grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: