The principal objective of this project, headed by Dr. Marc C. Nicklaus, Head, Computer-Aided Drug Design Group, is to make the information in the Open NCI Database available for aiding in drug development, both in-house and publicly. Both the data from NCI's Developmental Therapeutics Program (DTP) and additional information with which we have augmented the DTP datasets are used. Currently, additional databases are added to this resource, including large vendor catalogs of compounds that can be acquired for screening. Advanced processing is applied to the data, and powerful searching and display capabilities have been implemented. The NCI chemical structural database is a collection about half a million structures, accumulated in computer-readable form during the past 45 years in the course of NCI's screening of compounds for anti-cancer (and recently also anti-AIDS) activity. For about 50% of these molecules, samples are available for, e.g., testing in assays. Approximately half of the database is covered by confidentiality agreements with the samples' suppliers, whereas the other half (the """"""""Open NCI Database"""""""") is openly accessible, with the computer structures being made available by DTP as public domain data. We have subjected the Open NCI Database to various analyses that help to better understand its characteristics and put it in perspective of other large databases used in computer-aided drug design and chemical information sciences. Various clustering methods have been applied to it to elucidate its diversity, and the results have been compared with those for other databases. Internal duplication rates as well as mutual overlaps have been calculated for the entire set of databases including the Open NCI Database. The Open NCI Database has been converted into various formats, suitable for further processing including 3D pharmacophore searching. We have also implemented a powerful public search tool for the Open NCI Database with a web interface based on the chemical information toolkit CACTVS. Using just a web browser, the user is able to search about 250,000 structures for more than 600 criteria. We have greatly augmented the original DTP files with numerous additional data fields, be it calculated, predicted or hyperlinked information. These data have also been made available in directly downloadable format. Links to several additional services for further processing have been implemented. An online 3D pharmacophore capability has been built, a capability that is currently unique on the web, as far as we are aware of. Searchable predictions of more than 550 different biological activities, calculated by the program PASS for most of the quarter-million compounds, have been included in the web service (abstract). Current efforts, often in collaboration with various groups and companies, are underway to greatly enhance the total number of structures as well as the number and scope of associated calculated properties available in the framework of this project, plus to enhance the search and display capabilities. These efforts are intended to make this a powerful resource in in silico screening and computer-aided drug design.

National Institute of Health (NIH)
Division of Basic Sciences - NCI (NCI)
Intramural Research (Z01)
Project #
Application #
Study Section
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Basic Sciences
United States
Zip Code
Richard, Ann M; Gold, Lois Swirsky; Nicklaus, Marc C (2006) Chemical structure indexing of toxicity data on the internet: moving toward a flat world. Curr Opin Drug Discov Devel 9:314-25
Poroikov, Vladimir V; Filimonov, Dmitrii A; Ihlenfeldt, Wolf-Dietrich et al. (2003) PASS biological activity spectrum predictions in the enhanced open NCI database browser. J Chem Inf Comput Sci 43:228-36