We have developed databases and software useful for comparative analysis of protein three-dimensional structure. These tools are distributed freely to biologists and developers of biotechnology software. MMDB (Molecular Modeling DataBase) is the 3D-structure component of the Entrez molecular biology retrieval system. MMDB is an ASN.1 database where all data items describing macromolecular structure are validated and explicitly listed, so that application software need not contain the complex logic required to retrieve this information from text formats such as PDB files. Ongoing curatorial work has added of accurate taxonomy assignments for macromolecular structures within MMDB. Ongoing software development has added new message and data types for transmission of structure-structure alignment data to local viewers, and a highly automated monthly update and indexing system, Pubstruct. Cn3D (""""""""see in three dimensions"""""""") is a multi-structure visualization program distributed as part to the Entrez client software and in a stand-alone version launched via the MIME protocol in World-Wide-Web Entrez. The software differs from other public domain viewers in supporting display of multiple aligned structures from Entrez's """"""""structure neighbor"""""""" database, and in supporting simultaneous highlighting/picking of multiple sequence and multiple structure alignments. Other features are on-the-fly alignment of the sequences of homologs, so that an Entrez user may easily map conserved sequence features onto the known 3D structure. To facilitate molecular biologist's identification of important structure-function relationships within protein families we have added core-structure alignment editing and threading tools to Cn3D's sequence display windows, a feature which also supports curation of CDD (a Conserved Domain Database). The latest version also provides greatly improved molecular graphics performance on popular computing platforms. A new """"""""related structures"""""""" link has been added to NCBI BLAST servers this year, to provide easy-to use mapping to 3D structure whenever possible. This is based on a continuously updated database of pre-computed BLAST alignments of protein sequences in the Entrez/PubMed system with all protein structures to which they are similar. Future development will consider user of more accurate structure-based alignment methods. A completely re-engineered version of the Conserved Domain Architecture Research Tool (CDART) database has also been developed this year. It provides domain annotations derived from the Conserved Domain Database for MMDB protein structure displays as well as domain annotation for all protein sequences in the Entrez/Pubmed retrieval system. As of September, 2006, over 400,000 copies of Cn3D have been downloaded. Entrez's structure database resources are used by an average of 40,000 users per day, at rates averaging about 5000 web hits per hour.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Intramural Research (Z01)
Project #
1Z01LM000046-14
Application #
7316234
Study Section
(CBB)
Project Start
Project End
Budget Start
Budget End
Support Year
14
Fiscal Year
2006
Total Cost
Indirect Cost
Name
National Library of Medicine
Department
Type
DUNS #
City
State
Country
United States
Zip Code
Marchler-Bauer, Aron; Anderson, John B; Chitsaz, Farideh et al. (2009) CDD: specific functional annotation with the Conserved Domain Database. Nucleic Acids Res 37:D205-10
Tyagi, Manoj; Shoemaker, Benjamin A; Bryant, Stephen H et al. (2009) Exploring functional roles of multibinding protein interfaces. Protein Sci 18:1674-83
Thompson, Kenneth Evan; Wang, Yanli; Madej, Tom et al. (2009) Improving protein structure similarity searches using domain boundaries based on conserved sequence information. BMC Struct Biol 9:33
Sayers, Eric W; Barrett, Tanya; Benson, Dennis A et al. (2009) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 37:D5-15
Fong, Jessica H; Geer, Lewis Y; Panchenko, Anna R et al. (2007) Modeling the evolution of protein domain architectures using maximum parsimony. J Mol Biol 366:307-15
Marchler-Bauer, Aron; Anderson, John B; Derbyshire, Myra K et al. (2007) CDD: a conserved domain database for interactive domain family analysis. Nucleic Acids Res 35:D237-40
Madej, Thomas; Panchenko, Anna R; Chen, Jie et al. (2007) Protein homologous cores and loops: important clues to evolutionary relationships between structurally similar proteins. BMC Struct Biol 7:23
Wang, Yanli; Addess, Kenneth J; Chen, Jie et al. (2007) MMDB: annotating protein sequences with Entrez's 3D-structure database. Nucleic Acids Res 35:D298-300
Kann, Maricel G; Sheetlin, Sergey L; Park, Yonil et al. (2007) The identification of complete domains within protein sequences using accurate E-values for semi-global alignment. Nucleic Acids Res 35:4678-85
Chakrabarti, Saikat; Bryant, Stephen H; Panchenko, Anna R (2007) Functional specificity lies within the properties and evolutionary changes of amino acids. J Mol Biol 373:801-10

Showing the most recent 10 out of 19 publications