The principal objective of this project is to make large collections of small molecules available for aiding in drug development, both in-house and publicly. This project started with posting the information in the Open NCI Database on the CADD Group's public web server, but is moving far beyond this data set. Currently, additional databases are being added to this resource, including large vendor catalogs of compounds that can be acquired for screening. While the aggregate total of the databases that we can currently make publicly available is close to 2 million compounds, the total of all databases processed in this context exceeds 15 million compounds. Advanced processing is applied to the data, and powerful searching and display capabilities are being implemented.Current efforts, often in collaboration with various groups and companies, are underway to greatly enhance the total number and scope of associated calculated properties available in the framework of this project. These efforts are intended to make this a powerful resource in in silico screening and computer-aided drug design. One type of interface to these databases will resemble the Enhanced NCI Database Browser. The nature of the resources currently being developed shall be exemplified by a brief description of this service: The data in this current web service comprise data from NCI's Developmental Therapeutics Program (DTP) and additional information with which we have augmented the DTP datasets.The NCI chemical structural database is a collection about half a million structures, accumulated in computer-readable form during the past 45 years in the course of NCI's screening of compounds for anti-cancer (and recently also anti-AIDS) activity. For about 50% of these molecules, samples are available for, e.g., testing in assays. Approximately half of the database is covered by confidentiality agreements with the samples' suppliers, whereas the other half (the """"""""Open NCI Database"""""""") is openly accessible, with the computer structures being made available by DTP as public domain data. We have subjected the Open NCI Database to various analyses that help to better understand its characteristics and put it in perspective of other large databases used in computer-aided drug design and chemical information sciences. Various clustering methods have been applied to it to elucidate its diversity, and the results have been compared with those for other databases. Internal duplication rates as well as mutual overlaps have been calculated for the entire set of databases including the Open NCI Database. The Open NCI Database has been converted into various formats, suitable for further processing including 3D pharmacophore searching. We have also implemented a powerful public search tool for the Open NCI Database with a web interface based on the chemical information toolkit CACTVS. Using just a web browser, the user is able to search about 250,000 structures for more than 600 criteria. We have greatly augmented the original DTP files with numerous additional data fields, be it calculated, predicted or hyperlinked information. These data have also been made available in directly downloadable format. Links to several additional services for further processing have been implemented. An online 3D pharmacophore capability has been built, a capability that is currently unique on the web, as far as we are aware of. Searchable predictions of more than 550 different biological activities, calculated by the program PASS for most of the quarter-million compounds, have been included in the web service (abstract).
Richard, Ann M; Gold, Lois Swirsky; Nicklaus, Marc C (2006) Chemical structure indexing of toxicity data on the internet: moving toward a flat world. Curr Opin Drug Discov Devel 9:314-25 |
Poroikov, Vladimir V; Filimonov, Dmitrii A; Ihlenfeldt, Wolf-Dietrich et al. (2003) PASS biological activity spectrum predictions in the enhanced open NCI database browser. J Chem Inf Comput Sci 43:228-36 |