PubChem contains the chemical structures of small organic molecules and information on their biological activities. It is intended to support the Molecular Libraries and Imaging component of the NIH Roadmap Initiative. PubChem's chemical structure database may be searched on the basis of descriptive terms, chemical properties, and chemical structure similarity. When possible, PubChem's chemical structure records are linked to other NCBI databases. These include, for example, the PubMed scientific literature database and NCBI's protein 3D structure database. PubChem also contains the results of biological assay (bioassay) experiments. PubChem is organized as three interconnected databases within the Entrez/PubMed information retrieval system. These are PubChem Substance, PubChem Compound, and PubChem BioAssay. More information about using each component database may be found by following the link: Work during PubChem's first year was focused around multiple related subprojects needed to make the system as described operational in a short time frame. These included, among others: design of a robust data exchange specification for chemical substance information and generic bioassay result data; design of archival databases for chemical structure and bioassay data; and design of an indexing procedure for integration of PubChem?s three component databases into the Entrez search engine. Rudimentary work performed included development of validation and standardization procedures for processing ?legacy? chemical structure and bioassay data. These procedures employed a mix of novel and commercially available software to produce uniform valence-bond models of chemical structures in the archive. Another essential subproject included development of graphical display servers for chemical structure, both as individual ?substance summary? displays and as graphical chemical-drawing components of Entrez record-summary displays. Yet another included development of procedures for calculation of standard chemical properties, for example Lipinski-rule properties and standardized descriptors such as SMILES, InChI, and IUPAC systematic chemical names. Development work was required to construct the chemical compound similarity detection process that supports chemical identity and similarity neighbors within the Entrez system. This work employed a mix of commercial and custom software, in particular to group compounds reasonably, especially when given the possibility of incomplete data on stereochemistry and isotopic labeling. Creation of a procedure to link chemical/trivial names provided in the input data to MeSH headings and substance names, and, in turn, to articles in PubMed, represented a vital subproject. These links have proven to be an extremely valuable tool for biologist users searching for information on the biological activities of chemical compounds, or alternatively for information on chemical compounds associated with diagnosis or treatment of disease or other biological processes. A final subproject involved development of a novel bioassay result browser. This viewer allows users to examine descriptions of various depositor-supplied biological assay protocol parameters and readouts, and to construct lists of ?active? compounds according to thresholds specified by the depositor. Substances selected in this way, according to biological activity, may in turn be used in further Entrez queries. The initial public PubChem release in September 2004 included approximately 650,000 unique chemical structure records from ten ?legacy? government and academic sources. The initial release also included a set of approximately 200 bioassays from the DTP/NCI collection, each providing cancer and HIV screening data on an average of 15,000 compounds. Work during PubChem's second year focused on further refinement of the system's cheminformatic analysis tools, data presentation, and on development of a robust web-based deposition system. Cheminformatics milestones reached this year included development of a fully non-redundant """"""""compound"""""""" database, incorporation of a chemical-structure sketching tool into PubChem's structure search system, addition of """"""""shortcuts"""""""" to active compounds to simplify navigation of PubChem's BioAssay database, and graphical tools for browsing and selection of compounds based on bioactivity. PubChem's deposition system now supports fully automated user uploads of chemical structure and bioassay data, with an interactive """"""""help"""""""" system, and a staff of two bioassay curators to assist depositors in authoring comprehensive and searchable bioassay descriptions. During this year PubChem has attracted over 15 new depositors from government, academic and commercial organizations. The total number of unique chemical structures in PubChem has grown to over 3.2 million, and new bioassay results from the Molecular Libraries Screening Center Network have begun to arrive. As of September, 2005, PubChem is used by an average of 24,000 people per day, at average rates of approximately 4,000 web hits per hour.

National Institute of Health (NIH)
National Library of Medicine (NLM)
Intramural Research (Z01)
Project #
Application #
Study Section
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
National Library of Medicine
United States
Zip Code
Church, Deanna M; Hillier, LaDeana W (2009) Back to Bermuda: how is science best served? Genome Biol 10:105
Wang, Yanli; Xiao, Jewen; Suzek, Tugba O et al. (2009) PubChem: a public information system for analyzing bioactivities of small molecules. Nucleic Acids Res 37:W623-33
Han, Lianyi; Wang, Yanli; Bryant, Stephen H (2009) A survey of across-target bioactivity results of small molecules in PubChem. Bioinformatics 25:2251-5
Sayers, Eric W; Barrett, Tanya; Benson, Dennis A et al. (2009) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 37:D5-15
Borodina, Yulia V; Bolton, Evan; Fontaine, Fabien et al. (2007) Assessment of conformational ensemble sizes necessary for specific resolutions of coverage of conformational space. J Chem Inf Model 47:1428-37
Cheng, Kenneth T; Menkens, Anne; Bryant, Steve et al. (2007) NIH MICAD initiative and guest author program opportunities. J Nucl Med 48:19N
Fontaine, Fabien; Bolton, Evan; Borodina, Yulia et al. (2007) Fast 3D shape screening of large chemical databases through alignment-recycling. Chem Cent J 1:12
Wheeler, David L; Barrett, Tanya; Benson, Dennis A et al. (2007) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 35:D5-12