Chemical space is big data: the number of drug-like molecules exceeds 10^60. Experimentally screening compound libraries for drug candidates is a time consuming and expensive process. Virtual screening is a cheaper, faster approach for identifying potential drug candidates. Existing virtual screening methods typically scale linearly with the size of the compound library. A virtual screen of a million compounds may take days and requires a significant investment in computational infrastructure. The lack of scalable virtual screening algorithms and the difficulty in accessing the infrastructure necessary to perform large-scale virtual screening severely limits the ability of researchers to explore the big data of chemical space. This research plan will develop scalable virtual screening algorithms that will enable virtual screening on an interactive time scale (seconds to minutes). Interactive algorithms support the integration of expert human insight and knowledge with computational methods and permit rapid hypothesis testing and exploration. These interactive algorithms will be deployed both as open-source software and as part of an online drug discovery collaboration environment. The online environment will provide immediate access to the big data infrastructure needed to enable rapid and collaborative online virtual screening. Algorithms for filtering compound libraries based on pharmacophore and molecular shape properties will be developed. Unlike current approaches, these algorithms will scale with the breadth and complexity of the query, not with the size of the compound database, enabling scalable and rapid filtering of billions of chemical structures. Efficient methods for ranking the filtered resuts that harness the computational power of modem graphics processing units will also be developed. Backed by the appropriate computational resources, these algorithms will support the screening of billions of chemical structures on an interactive time-scale. The interactive performance of the tools will support rapid hypothesis testing and experimentation, and users will be able to submit their own compound libraries for screening, encouraging cross-discipline collaboration.

Public Health Relevance

The proposed research will result in novel algorithms and systems for the storage, retrieval, and analysis of chemical data to support the rapid identification of compounds of therapeutic interest. Successful application of these algorithms will reduce the cost and time of development of new drugs.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1-BST-N (52))
Program Officer
Preusch, Peter C
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Pittsburgh
Schools of Medicine
United States
Zip Code
Ragoza, Matthew; Hochuli, Joshua; Idrobo, Elisa et al. (2017) Protein-Ligand Scoring with Convolutional Neural Networks. J Chem Inf Model 57:942-957
Koes, David R; Vries, John K (2017) Evaluating amber force fields using computed NMR chemical shifts. Proteins 85:1944-1956
Koes, David R; Vries, John K (2017) Error assessment in molecular dynamics trajectories using computed NMR chemical shifts. Comput Theor Chem 1099:152-166
Sunseri, Jocelyn; Ragoza, Matthew; Collins, Jasmine et al. (2016) A D3R prospective evaluation of machine learning for protein-ligand scoring. J Comput Aided Mol Des 30:761-771
Pirhadi, Somayeh; Sunseri, Jocelyn; Koes, David Ryan (2016) Open source molecular modeling. J Mol Graph Model 69:127-43
Hain, Ethan; Camacho, Carlos J; Koes, David Ryan (2016) Fragment oriented molecular shapes. J Mol Graph Model 66:143-54
Sunseri, Jocelyn; Koes, David Ryan (2016) Pharmit: interactive exploration of chemical space. Nucleic Acids Res 44:W442-8
Paiardini, Alessandro; Fiascarelli, Alessio; Rinaldo, Serena et al. (2015) Screening and in vitro testing of antifolate inhibitors of human cytosolic serine hydroxymethyltransferase. ChemMedChem 10:490-7
Rego, Nicholas; Koes, David (2015) 3Dmol.js: molecular visualization with WebGL. Bioinformatics 31:1322-4
Koes, David Ryan; Camacho, Carlos J (2014) Shape-based virtual screening with volumetric aligned molecular shapes. J Comput Chem 35:1824-34