Chemical space is big data: the number of drug-like molecules exceeds 10^60. Experimentally screening compound libraries for drug candidates is a time consuming and expensive process. Virtual screening is a cheaper, faster approach for identifying potential drug candidates. Existing virtual screening methods typically scale linearly with the size of the compound library. A virtual screen of a million compounds may take days and requires a significant investment in computational infrastructure. The lack of scalable virtual screening algorithms and the difficulty in accessing the infrastructure necessary to perform large-scale virtual screening severely limits the ability of researchers to explore the big data of chemical space. This research plan will develop scalable virtual screening algorithms that will enable virtual screening on an interactive time scale (seconds to minutes). Interactive algorithms support the integration of expert human insight and knowledge with computational methods and permit rapid hypothesis testing and exploration. These interactive algorithms will be deployed both as open-source software and as part of an online drug discovery collaboration environment. The online environment will provide immediate access to the big data infrastructure needed to enable rapid and collaborative online virtual screening. Algorithms for filtering compound libraries based on pharmacophore and molecular shape properties will be developed. Unlike current approaches, these algorithms will scale with the breadth and complexity of the query, not with the size of the compound database, enabling scalable and rapid filtering of billions of chemical structures. Efficient methods for ranking the filtered resuts that harness the computational power of modem graphics processing units will also be developed. Backed by the appropriate computational resources, these algorithms will support the screening of billions of chemical structures on an interactive time-scale. The interactive performance of the tools will support rapid hypothesis testing and experimentation, and users will be able to submit their own compound libraries for screening, encouraging cross-discipline collaboration.

Public Health Relevance

The proposed research will result in novel algorithms and systems for the storage, retrieval, and analysis of chemical data to support the rapid identification of compounds of therapeutic interest. Successful application of these algorithms will reduce the cost and time of development of new drugs.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM108340-03
Application #
8847744
Study Section
Special Emphasis Panel (ZRG1-BST-N (52))
Program Officer
Preusch, Peter
Project Start
2013-08-10
Project End
2016-04-30
Budget Start
2015-05-01
Budget End
2016-04-30
Support Year
3
Fiscal Year
2015
Total Cost
$191,933
Indirect Cost
$49,769
Name
University of Pittsburgh
Department
Biology
Type
Schools of Medicine
DUNS #
004514360
City
Pittsburgh
State
PA
Country
United States
Zip Code
15213
Hochuli, Joshua; Helbling, Alec; Skaist, Tamar et al. (2018) Visualizing convolutional neural network protein-ligand scoring. J Mol Graph Model 84:96-108
Sunseri, Jocelyn; King, Jonathan E; Francoeur, Paul G et al. (2018) Convolutional neural network scoring and minimization in the D3R 2017 community challenge. J Comput Aided Mol Des :
Koes, David R; Dömling, Alexander; Camacho, Carlos J (2018) AnchorQuery: Rapid online virtual screening for small-molecule protein-protein interaction inhibitors. Protein Sci 27:229-232
Gau, David; Lewis, Taber; McDermott, Lee et al. (2018) Structure-based virtual screening identifies a small-molecule inhibitor of the profilin 1-actin interaction. J Biol Chem 293:2606-2616
Koes, David R; Vries, John K (2017) Evaluating amber force fields using computed NMR chemical shifts. Proteins 85:1944-1956
Ragoza, Matthew; Hochuli, Joshua; Idrobo, Elisa et al. (2017) Protein-Ligand Scoring with Convolutional Neural Networks. J Chem Inf Model 57:942-957
Koes, David R; Vries, John K (2017) Error assessment in molecular dynamics trajectories using computed NMR chemical shifts. Comput Theor Chem 1099:152-166
Sunseri, Jocelyn; Ragoza, Matthew; Collins, Jasmine et al. (2016) A D3R prospective evaluation of machine learning for protein-ligand scoring. J Comput Aided Mol Des 30:761-771
Hain, Ethan; Camacho, Carlos J; Koes, David Ryan (2016) Fragment oriented molecular shapes. J Mol Graph Model 66:143-54
Sunseri, Jocelyn; Koes, David Ryan (2016) Pharmit: interactive exploration of chemical space. Nucleic Acids Res 44:W442-8

Showing the most recent 10 out of 14 publications