Scalable tools for the analysis of chemical compounds using graph-based querying

Lindstrom, William

Abstract

Our current capacity to generate chemical and structural biological data far exceeds our capability to meaningfully assimilate it. The data describes molecules and biological macromolecules and associated properties. A principle common to the structure of all chemical and biological macromolecular entities is the composition of objects related by energetic interaction. A natural representation of all such entities is a graph composed of nodes related by edges. We have developed powerful, scalable techniques that operate on graph databases for efficient similarity searching (Closure-tree), identification of statistically significant subgraphs (GraphRank), and query specification (GraphQL). These techniques are naturally applied to chemical and structural biological data, which are naturally represented as graphs. We have demonstrated the validity of the approach in prior work, and the feasibility in our phase 1 research. The overall goal of this project is to deliver powerful innovative problem solving tools to medicinal chemists, structural biologists, and drug discovery researchers synthesizing ever increasing amounts of chemical, biochemical, structural biological, cell biological, and clinical data. Phase 1 of this project is ongoing and highly successful. We have successfully demonstrated that the Closure- tree and GraphRank algorithms are effective on chemical compound databases of realistic, industrial size. We have developed methods to exploit our knowledge of the nature of chemical databases. Using these methods we have improved similarity query performance time by over an order of magnitude. We have identified several specific aims to purse in Phase 2 of our research. We have rapidly established a professional software development and research infrastructure and developed the tools necessary to support progress toward the goal of solving important problems hindering medicinal chemists and structural biologists conducting modern drug discovery research for the development of new therapeutics. We will pursue four specific aims in our Phase 2 research. (1) We will develop specific additional functionality for Closure-tree and GraphRank, and integrate GraphQL into our chemical and structural bioinformatics tool set. The results of this aim will be used to (2) develop methods and functionality to represent chemical, structural biology, systems biology, and glycobiology data as graphs. Building on these results, we will (3) apply our tool set to specific relevant research problems such as HIV-1 Protease inhibition, Avian Flu neuraminidase inhibition, and p53-protein interactions. Finally, we will (4) assemble a state-of-the-art chemical and structural biological informatics tool set with detailed documentation and relevant case studies. The outcome of this research will be powerful, innovative new tools in the hands of medicinal chemists, structural biologists, and modern drug discovery researchers in academia and the pharmaceutical industry. The tools address significant obstacles in the drug development process and will enable new discoveries and greatly advance the practice of cheminformatic and structural biological data analysis. Through a carefully developed market analysis described in our commercialization plan, we show a growing market for our tools and competitive advantages. Application of our techniques will have significant impact on the interpretation of structural biological data, on pharmaceutical research and modern drug discovery chemistry, and on human health care through the design of new drugs.

Public Health Relevance

Graph-based representation of chemical compounds results in a more accurate realization of the chemical space. The use of recent techniques in graph querying and mining will enable data analysis that can scale to millions of compounds. The developed system will integrate information on chemical compounds with biological activity and protein interaction networks, thus enabling cheaper and faster drug discovery. ? ? ?

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of Mental Health (NIMH)
Type: Small Business Innovation Research Grants (SBIR) - Phase II (R44)
Project #: 9R44MH086121-02
Application #: 7539247
Study Section: Special Emphasis Panel (ZRG1-BST-E (10))
Program Officer: Stirratt, Michael J

Project Start: 2007-09-01
Project End: 2011-08-31
Budget Start: 2008-09-10
Budget End: 2009-08-31
Support Year: 2
Fiscal Year: 2008
Total Cost: $518,950
Indirect Cost

Institution

Name: Acelot, Inc.
Department
Type
DUNS #: 784692001

City: Santa Barbara
State: CA
Country: United States
Zip Code: 93111

Related projects


NIH 2010 R44 MH	Scalable tools for the analysis of chemical compounds using graph-based querying Lang, Christian / Acelot, Inc.	$183,715
NIH 2009 R44 MH	Scalable tools for the analysis of chemical compounds using graph-based querying Lindstrom, William Maxwell / Acelot, Inc.	$420,569
NIH 2008 R44 MH	Scalable tools for the analysis of chemical compounds using graph-based querying Lindstrom, William Maxwell / Acelot, Inc.	$518,950

Comments

Be the first to comment on William Lindstrom's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: