This award provides funding for the development of effective and efficient algorithms to analyze large chemical compound databases and identify the compounds that are the most probable for displaying the desired drug-like behavior. These virtual screening algorithms are based on a substructure-based classification framework that utilizes (i) highly efficient frequent subgraph discovery algorithms that mine the chemical compounds to discover all the substructures (topological or geometric) that are critical for the classification task, (ii) sophisticated feature selection and generation algorithms that combine multiple criteria to identify and synthesize a set of substructure-based features that simultaneously simplify the representation of the original compounds while retaining and exposing their key features, and (iii) kernel-based approaches that take into account the relationships between these substructures at different levels of granularity and complexity. The research is integrated with an educational plan that focuses on initiating undergraduate and graduate students to the various computational and data analysis aspects of virtual screening, machine learning, and data mining through courses, summer institutes, and research opportunities.

The successful completion of this project will lead to advances in the drug development process by developing computationally efficient and accurate classification algorithms that can be used to replace or supplement biological-assay-based high-throughput screening (HTS) techniques and by producing a general purpose chemical compound classification software toolkit that will contain high-quality implementations of the various algorithms that will be developed and made available to the public. The combination of existing HTS-based approaches with these virtual screening methods will allow a move away from purely random-based testing, toward more meaningful and directed iterative rapid-feedback searches of subsets and focused libraries.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
0431135
Program Officer
Sylvia J. Spengler
Project Start
Project End
Budget Start
2004-09-01
Budget End
2009-08-31
Support Year
Fiscal Year
2004
Total Cost
$405,498
Indirect Cost
Name
University of Minnesota Twin Cities
Department
Type
DUNS #
City
Minneapolis
State
MN
Country
United States
Zip Code
55455