We have developed databases and software useful for modeling of protein three-dimensional structure and analysis of sequence-structure relationships. These tools have been distributed freely to biologists and developers of biotechnology software. The work may be divided into three areas: 1) continued development of the """"""""PKB"""""""" object-oriented research database, 2) production of GenBank entries describing proteins and nucleic acids with known structure, and 3) design and implementation of a structural database convenient for developers of molecular modeling software. The PKB data specification has been expanded to include all data items defined by the Protein Data Bank, including scientific literature citations and bonded connectivity. We have also added validation procedures which recover correct definitions of biopolymer sequence and chemical modification, and identify stereochemical anomalies. We have used PKB to produce ASN.1-language reports of structural features mapable to sequence, using the current NCBI/GenBank data specification. These data have been updated as new structures became available from the Protein Data Bank, and incorporated into the widely distributed GenBank/ASN.1 and Entrez databases. To provide convenient access to structural data from modern application programs written in C we have begun development of an ASN.1 database containing complete covalent and spatial structure data. Its specification allows comparison of biopolymer or non-biopolymer components of biological macromolecules according to chemical structure, and direct representation of three-dimensional structure inferred by alignment with homologous or chemically similar molecules. The significance of this work is in providing biologists with easy access to structural data, and in providing a software infrastructure to researchers interested in sequence-structure relationships and programmers developing molecular modeling software.
Marchler-Bauer, Aron; Anderson, John B; Chitsaz, Farideh et al. (2009) CDD: specific functional annotation with the Conserved Domain Database. Nucleic Acids Res 37:D205-10 |
Tyagi, Manoj; Shoemaker, Benjamin A; Bryant, Stephen H et al. (2009) Exploring functional roles of multibinding protein interfaces. Protein Sci 18:1674-83 |
Thompson, Kenneth Evan; Wang, Yanli; Madej, Tom et al. (2009) Improving protein structure similarity searches using domain boundaries based on conserved sequence information. BMC Struct Biol 9:33 |
Sayers, Eric W; Barrett, Tanya; Benson, Dennis A et al. (2009) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 37:D5-15 |
Fong, Jessica H; Geer, Lewis Y; Panchenko, Anna R et al. (2007) Modeling the evolution of protein domain architectures using maximum parsimony. J Mol Biol 366:307-15 |
Marchler-Bauer, Aron; Anderson, John B; Derbyshire, Myra K et al. (2007) CDD: a conserved domain database for interactive domain family analysis. Nucleic Acids Res 35:D237-40 |
Madej, Thomas; Panchenko, Anna R; Chen, Jie et al. (2007) Protein homologous cores and loops: important clues to evolutionary relationships between structurally similar proteins. BMC Struct Biol 7:23 |
Wang, Yanli; Addess, Kenneth J; Chen, Jie et al. (2007) MMDB: annotating protein sequences with Entrez's 3D-structure database. Nucleic Acids Res 35:D298-300 |
Kann, Maricel G; Sheetlin, Sergey L; Park, Yonil et al. (2007) The identification of complete domains within protein sequences using accurate E-values for semi-global alignment. Nucleic Acids Res 35:4678-85 |
Chakrabarti, Saikat; Bryant, Stephen H; Panchenko, Anna R (2007) Functional specificity lies within the properties and evolutionary changes of amino acids. J Mol Biol 373:801-10 |
Showing the most recent 10 out of 19 publications