The Protein Data Bank (PDB) contains the atomic structure of macromolecules. As of October 1991 there were 790 structural entries (196 Mbytes), if current growth rates persist, this number could grow to 10,000 by the end of the decade. The data provide opportunities for understanding biological function through, for example, comparative structural research. This work addresses several challenges in first making the PDB more accessible to molecular biologists and crystallographers in particular, and second assisting in the management of increasing amounts of data. Several software developments are being undertaken in parallel, but share the same class libraries. First, a new object-based PDB storage format provides suitable access to the levels of substructure found in macromolecules. Second, object-based software tools that interrogate and manipulate structural data, and assist in structure verification are being derived from existing structured programs. Finally, a high-level query language provides intuitive and direct interaction with the PDB. Each aspect of software development proceeds by prototyping followed by iterative cycles of testing in the laboratory and code modification. This work integrates the state-of-art database research results such as object-oriented databases and knowledge bases, software engineering results such as component and glue collaborative work such as extended transaction models to support cooperative scientific research. These tools could potentially precipitate the discovery of new structure-function relationships by permitting data query in a more intuitive fashion.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
9116798
Program Officer
Program Director
Project Start
Project End
Budget Start
1992-01-01
Budget End
1995-06-30
Support Year
Fiscal Year
1991
Total Cost
$1,251,150
Indirect Cost
Name
Columbia University
Department
Type
DUNS #
City
New York
State
NY
Country
United States
Zip Code
10027