*** Proteins are aptly referred to as nature's building blocks. Understanding how these proteins function is key to understanding life's processes. On the other hand, understanding why these proteins do not function, or function incorrectly, tells us much about particular diseases and illnesses. Therefore, it is not surprising that the amount of experimental data available on proteins, like most other biological data, are growing exponentially as we attempt to unravel these mysteries. Effective use of these data requires usable and current databases. Most public databases contain only one form of data, either sequences, structure, and to a lesser extent functional information, on a large variety of loosely classified proteins. While these resources are clearly very useful, other types of resources are needed that are publicly available yet can be maintained at very low cost. This project encompasses the development of one such database that will contain an automated classification of proteins based on an extensive set of properties more properties than considered by any other resource. The Database of Classified Proteins (DCP) is a generalization of previous work for supporting information specifically on the protein kinase family of enzymes (see http://www. sdsc.edu/kinases). The DCP will contain an extensive set of physical and derived properties (a composite property description) including sequence, structure, evolutionary information (e.g., correlated mutations) and functional sites (initially derived from structure but later predicted from the composite description). New protein structures deposited with the PDB will be, where possible, aligned to an existing composite property description defining a protein family. The outcome of this work will be an all-by-all comparison of proteins for which structures are known derived using the compute resources of the National Partnership for Advanced Computational Infrastructure (NPACI). That comparison wil l be publicly available as the DCP and maintained on a 7/24 basis. The DCP will provide new opportunities for comparing proteins and visualization tools are pro-posed to make this possible through the Web. The DCP should provide new insights into protein function by revealing hitherto undiscovered functional sites. Furthers the DCP should be useful as a fold recognition database for use in threading sequences with unknown structure on to known folds. The DCP will contain much more information than could hitherto be applied to the structure prediction problem. ***

Agency
National Science Foundation (NSF)
Institute
Division of Biological Infrastructure (DBI)
Application #
9808706
Program Officer
Sylvia J. Spengler
Project Start
Project End
Budget Start
1998-10-01
Budget End
2001-09-30
Support Year
Fiscal Year
1998
Total Cost
$450,000
Indirect Cost
Name
University of California San Diego
Department
Type
DUNS #
City
La Jolla
State
CA
Country
United States
Zip Code
92093