Chemical space is big data: the number of drug-like molecules exceeds 10^60. Experimentally screening compound libraries for drug candidates is a time consuming and expensive process. Virtual screening is a cheaper, faster approach for identifying potential drug candidates. Existing virtual screening methods typically scale linearly with the size of the compound library. A virtual screen of a million compounds may take days and requires a significant investment in computational infrastructure. The lack of scalable virtual screening algorithms and the difficulty in accessing the infrastructure necessary to perform large-scale virtual screening severely limits the ability of researchers to explore the big data of chemical space. This research plan will develop scalable virtual screening algorithms that will enable virtual screening on an interactive time scale (seconds to minutes). Interactive algorithms support the integration of expert human insight and knowledge with computational methods and permit rapid hypothesis testing and exploration. These interactive algorithms will be deployed both as open-source software and as part of an online drug discovery collaboration environment. The online environment will provide immediate access to the big data infrastructure needed to enable rapid and collaborative online virtual screening. Algorithms for filtering compound libraries based on pharmacophore and molecular shape properties will be developed. Unlike current approaches, these algorithms will scale with the breadth and complexity of the query, not with the size of the compound database, enabling scalable and rapid filtering of billions of chemical structures. Efficient methods for ranking the filtered resuts that harness the computational power of modem graphics processing units will also be developed. Backed by the appropriate computational resources, these algorithms will support the screening of billions of chemical structures on an interactive time-scale. The interactive performance of the tools will support rapid hypothesis testing and experimentation, and users will be able to submit their own compound libraries for screening, encouraging cross-discipline collaboration.

Public Health Relevance

The proposed research will result in novel algorithms and systems for the storage, retrieval, and analysis of chemical data to support the rapid identification of compounds of therapeutic interest. Successful application of these algorithms will reduce the cost and time of development of new drugs.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM108340-02
Application #
8716786
Study Section
Special Emphasis Panel (ZRG1-BST-N (52))
Program Officer
Preusch, Peter
Project Start
2013-08-10
Project End
2016-04-30
Budget Start
2014-05-01
Budget End
2015-04-30
Support Year
2
Fiscal Year
2014
Total Cost
$191,702
Indirect Cost
$49,538
Name
University of Pittsburgh
Department
Biology
Type
Schools of Medicine
DUNS #
004514360
City
Pittsburgh
State
PA
Country
United States
Zip Code
15213
Koes, David Ryan; Camacho, Carlos J (2014) Shape-based virtual screening with volumetric aligned molecular shapes. J Comput Chem 35:1824-34