The principal objective of this project is to make large collections of small molecules available for aiding in drug development, both in-house and publicly, as well as to provide free chemoinformatics tools aiding one in dealing with such databases. This project started with posting the information in the Open NCI Database on the CADD Group's public web server, but has moved far beyond this data set. Currently, additional databases are being added to this resource, including large vendor catalogs of compounds that can be acquired for screening. Advanced processing is applied to the data, and powerful searching and display capabilities are being implemented. Current efforts in collaboration with both U.S. governmental and academic groups as well as with companies are underway to greatly enhance the total number and scope of associated calculated properties available in the framework of this project. These efforts are intended to make this a powerful resource in in silico screening and computer-aided drug design. One type of interface to these databases will resemble the (http://cactus.nci.nih.gov/ncidb2/) Enhanced NCI Database Browser. The nature of the resources currently being developed shall be exemplified by a brief description of this service: The data in this current web service comprise data from NCI's Developmental Therapeutics Program (DTP) and additional information with which we have augmented the DTP datasets. The NCI chemical structural database is a collection about half a million structures, accumulated in computer-readable form during the past 45 years in the course of NCI's screening of compounds for anti-cancer and anti-AIDS activity. Approximately half of the database is covered by confidentiality agreements with the samples' suppliers, whereas the other half (the """"""""Open NCI Database"""""""") is openly accessible, with the computer structures being made available by DTP as public domain data. We have subjected the Open NCI Database to various analyses that help to better understand its characteristics and put it in perspective of other large databases used in computer-aided drug design and chemical information sciences. Various clustering methods have been applied to it to elucidate its diversity, and the (www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=11410049&dopt=Abstract) results have been compared with those for other databases. The Open NCI Database has been converted into various formats, suitable for further processing including 3D pharmacophore searching. We have also (www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=11855965&dopt=Abstract) implemented a powerful public search tool for the Open NCI Database with a web interface based on the chemical information toolkit CACTVS. Using just a web browser, the user is able to search about 250,000 structures for more than 600 criteria. We have greatly augmented the original DTP files with numerous additional data fields, be it calculated, predicted or hyperlinked information. These data have also been made available in directly downloadable format. Links to several additional services for further processing have been implemented. An online 3D pharmacophore capability has been built, a capability that is currently unique on the web, as far as we are aware of. Searchable predictions of more than 550 different biological activities, calculated by the program PASS for most of the quarter-million compounds, have been included in the web service (www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=12546557&dopt=Abstract) abstract. A new service is our Chemical Structure Lookup Service (CSLS), available at (http://cactus.nci.nih.gov/lookup/). CSLS is essentially a """"""""phone book"""""""" for small molecules, allowing the user to quickly find out in which, if any, of over 100 different databases (both public and commercial), comprising more than 56 million entries, their compounds occur. Part of these projects is the downloading, reformatting and evaluation for cancer-related purposes, of the massive set of structure and assay data as deposited in (http://pubchem.ncbi.nlm.nih.gov/) PubChem. A recent addition to our collection of public tools is our (http://cactus.nci.nih.gov/osra/) Optical Structure Recognition service for molecules. OSRA is a utility designed to convert graphical representations of chemical structures, as they appear in journal articles, patent documents, textbooks, trade magazines etc., into SMILES (Simplified Molecular Input Line Entry Specification - see (http://en.wikipedia.org/wiki/SMILES) - a computer recognizable molecular structure format. OSRA can read a document in over 90 graphical formats parseable - including GIF, JPEG, PNG, TIFF, PDF, PS etc., and generate the SMILES representation of the molecular structure images encountered within that document. Also new is our (http://cactvs.nci.nih.gov/sicc/sicc.html) Structure Identifier Calculator & Converter (SICC), which allows the user to calculate our in-house developed NCI/CADD Structure Identifiers, as they are used in CSLS. SICC also calculates or converts (http://en.wikipedia.org/wiki/InChI) InChI and InChIKey from various chemical structure representations. The URL of our public web server is (http://cactus.nci.nih.gov). Finally, efforts were spearheaded to implement a new resource for making affordable chemical synthesis of screening samples available to all NIH researchers. This has been realized in the form of an extension of the contract with company ChemNavigator, who have implemented the new so-called (www.chemnavigator.com/cnc/services/SCSORS_Overview.asp) Semi-Custom Synthesis Online Request System (SCSORS).

Agency
National Institute of Health (NIH)
Institute
National Cancer Institute (NCI)
Type
Intramural Research (Z01)
Project #
1Z01BC010517-06
Application #
7733064
Study Section
Project Start
Project End
Budget Start
Budget End
Support Year
6
Fiscal Year
2008
Total Cost
$311,619
Indirect Cost
Name
National Cancer Institute Division of Basic Sciences
Department
Type
DUNS #
City
State
Country
United States
Zip Code
Richard, Ann M; Gold, Lois Swirsky; Nicklaus, Marc C (2006) Chemical structure indexing of toxicity data on the internet: moving toward a flat world. Curr Opin Drug Discov Devel 9:314-25
Poroikov, Vladimir V; Filimonov, Dmitrii A; Ihlenfeldt, Wolf-Dietrich et al. (2003) PASS biological activity spectrum predictions in the enhanced open NCI database browser. J Chem Inf Comput Sci 43:228-36