PubChem provides a public repository of chemical-structure records contributed by more than 80 organizations. Processing is automated, allowing PubChem's substance database to grow to over 40 million records in less than 5 years. A critical aspect of chemical-structure processing is standardization of valence-bond models, to provide the unique tautomer and/or resonance form stored in PubChem's compound datbase. Standardization enables cross-linking of deposited records that represent identical chemicals and calculation of accurate comparison scores to detect chemicals with similar though not identical structures. PubChem's chemical structure databases can be searched by chemical name or structure and can display results as Entrez """"""""docsum"""""""" lists, structure similarity diagrams, or detailed """"""""summary"""""""" records that include biological activity information.? ? An informatics project undertaken this year has enabled rapid daily update of PubChem's chemical structure database. Depositors often provide millions of new or modified records per day, such that loading of deposited and standardized structures may take many minutes or hours. The redesigned system carefully schedules updates with replication and backup of the multiple PubChem database servers, as necessary to provide uninteruppted information services to users. Another informatics project still in progress calculates theoretical three dimensional structures of compounds in PubChem. These will be used to calculate structural similarities based on three dimensional conformer similarity, and in particular to PubChem analysis tools that cluster chemical structures and their biological activities in multiple PubChem Bioassay records.? ? PubChem's Bioassay database is a repository for the results of chemical biology screening experiemnts, largely provided by grantees of the NIH Molecular Libraries roadmap program. The number of bioassay records has grown rapidly this year to over 1,100 records containing in total over 30 million tests of specific biological activities of individual chemical compounds. Bioassay records contain a description of the experimental protocol and are carefully curated to assure clarity of the experimental readouts provided in the data table associated with each record. Bioassay records are automatically neighbored to one another when they report one or more of the same chemicals as biologically active and/or when they link to target proteins or genes sequence-similar to one another. New Molecular Libraries grants were awarded in September 2008 and the growth of PubChem's Bioassay database is expected to continue.? ? An important informatics project undertaken this year has been to provide simpler links from compounds to information on their biolgoical activity. A tool shown at the top of every Entrez """"""""docsum"""""""" list of compounds provides """"""""Bioactivity Analysis"""""""", a list of PubChem Bioassay records where one or more of the chemicals was tested, so sorted as to place experiments with the greatest number of bioactives at the top of the list. From the """"""""Bioactivity Analysis"""""""" page a further link is provided to """"""""Structure Activity"""""""", presenting an informative display of chemicals grouped by structural similarity and bioassays grouped by active-compound overlap or target sequence similarity. Another informatics project still underway improves NLM drug and toxicology information in compound """"""""summary"""""""" and Entrez """"""""docsum"""""""" displays.? ? The NIH Molecular Libraries roadmap project has supported another database in addition to PubChem, the Molecular Imaging and Contrast Agent Database or MICAD, presented as one of the collections in the Entrez """"""""Books"""""""" database. MICAD is a collection of regularly-formated review articles, each describing an imaging agent with links to PubChem records for chemical structure and PubMed articles cited in the review. Curators on the MICAD team author the reviews and create PubChem records needed for each. The total number of MICAD review article supassed 500 in summer of 2008.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Intramural Research (Z01)
Project #
1Z01LM100604-05
Application #
7735086
Study Section
Project Start
Project End
Budget Start
Budget End
Support Year
5
Fiscal Year
2008
Total Cost
$3,534,815
Indirect Cost
Name
National Library of Medicine
Department
Type
DUNS #
City
State
Country
United States
Zip Code
Han, Lianyi; Wang, Yanli; Bryant, Stephen H (2009) A survey of across-target bioactivity results of small molecules in PubChem. Bioinformatics 25:2251-5
Sayers, Eric W; Barrett, Tanya; Benson, Dennis A et al. (2009) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 37:D5-15
Church, Deanna M; Hillier, LaDeana W (2009) Back to Bermuda: how is science best served? Genome Biol 10:105
Wang, Yanli; Xiao, Jewen; Suzek, Tugba O et al. (2009) PubChem: a public information system for analyzing bioactivities of small molecules. Nucleic Acids Res 37:W623-33
Borodina, Yulia V; Bolton, Evan; Fontaine, Fabien et al. (2007) Assessment of conformational ensemble sizes necessary for specific resolutions of coverage of conformational space. J Chem Inf Model 47:1428-37
Cheng, Kenneth T; Menkens, Anne; Bryant, Steve et al. (2007) NIH MICAD initiative and guest author program opportunities. J Nucl Med 48:19N
Fontaine, Fabien; Bolton, Evan; Borodina, Yulia et al. (2007) Fast 3D shape screening of large chemical databases through alignment-recycling. Chem Cent J 1:12
Wheeler, David L; Barrett, Tanya; Benson, Dennis A et al. (2007) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 35:D5-12