We have developed databases and software useful for modeling of protein 3-dimensional structure and analysis of sequence-structure relationships. These tools have been distributed freely to biologists and developers of biotechnology software. The work may be divided into three areas: 1) continued development of the """"""""pKB"""""""" object-oriented research database, 2) production of """"""""Entrez"""""""" sequences for proteins and nucleic acids with known structure, and 3) implementation of a MMDB, a 3-dimensional structure database for """"""""Entrez"""""""", PKB has been modified to include core"""""""" motif identification, non-redundant set construction, and robust input of protein Data Bank (PDB) files containing new classes of error. It has been used to produce ASN.1-language reports of structural features mappable to sequence, as needed for """"""""Entrez"""""""", using the NCBI/GenBank sequence data specification. To include explicit 3-dimensional information in """"""""Entrez"""""""" we have created the MMDB database. MMDB employs an ASN.l data specification which defines precisely the chemical graph of biopolymers and non-polymer components such a substrates or cofactors, and maps them into a measured 3-dimensional space. It in this way facilitates generalized homology modeling, inference of spatial structure based on comparison and alignment by chemical structure. MMDB provides data in a form convenient for """"""""Entrez"""""""" and for developers of molecular modeling software, as ASN.1 data files and associated object loaders which automatically generate analogous in-memory C data structures. Chemical graphs are constructed largely by computation within pKB, since this information is implicit within the current PDB format, and we have validated the algorithms used. The significance of this work is in providing biologists with easy access to structural data, and in providing a software infrastructure to researchers interested in sequence-structure relationships and programmers developing molecular modeling software.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Intramural Research (Z01)
Project #
1Z01LM000046-02
Application #
3759316
Study Section
Project Start
Project End
Budget Start
Budget End
Support Year
2
Fiscal Year
1994
Total Cost
Indirect Cost
Name
National Library of Medicine
Department
Type
DUNS #
City
State
Country
United States
Zip Code
Marchler-Bauer, Aron; Anderson, John B; Chitsaz, Farideh et al. (2009) CDD: specific functional annotation with the Conserved Domain Database. Nucleic Acids Res 37:D205-10
Tyagi, Manoj; Shoemaker, Benjamin A; Bryant, Stephen H et al. (2009) Exploring functional roles of multibinding protein interfaces. Protein Sci 18:1674-83
Thompson, Kenneth Evan; Wang, Yanli; Madej, Tom et al. (2009) Improving protein structure similarity searches using domain boundaries based on conserved sequence information. BMC Struct Biol 9:33
Sayers, Eric W; Barrett, Tanya; Benson, Dennis A et al. (2009) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 37:D5-15
Fong, Jessica H; Geer, Lewis Y; Panchenko, Anna R et al. (2007) Modeling the evolution of protein domain architectures using maximum parsimony. J Mol Biol 366:307-15
Marchler-Bauer, Aron; Anderson, John B; Derbyshire, Myra K et al. (2007) CDD: a conserved domain database for interactive domain family analysis. Nucleic Acids Res 35:D237-40
Madej, Thomas; Panchenko, Anna R; Chen, Jie et al. (2007) Protein homologous cores and loops: important clues to evolutionary relationships between structurally similar proteins. BMC Struct Biol 7:23
Wang, Yanli; Addess, Kenneth J; Chen, Jie et al. (2007) MMDB: annotating protein sequences with Entrez's 3D-structure database. Nucleic Acids Res 35:D298-300
Kann, Maricel G; Sheetlin, Sergey L; Park, Yonil et al. (2007) The identification of complete domains within protein sequences using accurate E-values for semi-global alignment. Nucleic Acids Res 35:4678-85
Chakrabarti, Saikat; Bryant, Stephen H; Panchenko, Anna R (2007) Functional specificity lies within the properties and evolutionary changes of amino acids. J Mol Biol 373:801-10

Showing the most recent 10 out of 19 publications