We propose to develop a protein identification resource. It will contain an expert computer system for protein identification, which will incorporate and identification paradigm, suitable computer programs, organizational structures (including correlations and patterns in the information) and protein sequences, amino acid compositions, and ancillary biochemical and biological knowledge. We also propose to develop a system of programs to make predictions of medical significance based mainly on the Resource knowledge, including secondary structure, antigenic sites, recognition domains and cross-reactivity of antibodies, best nucleic acid sequence probes, and possible restriction enzyme cut sites of coding regions. Finally, we plan to develop a computer system using the knowledge base that will facilitate associative browsing, the development of scientific insight, and the rejection of false hypotheses. Collaborative research will involve two theoretical projects to quantitate the use of additional data, from amino acid composition and from predicted secondary structures, to improve the power of the identification system. The other two projects involve the examination of new kinds of experimental data to make identifications. A workshop on computer methods will be held in the first year to suggest new collaborative projects. We will continue the on-line public access to our protein sequence knowledge base. We will publish a Newsletter to familiarize users with the system. Our goal is to develop a system so easy to use that biochemists all over the worls will perform their own routine identifications using telephone networks. The great explosion in the accumulation of structural data bears witness that investigators, over 4,000 of them, think that the information is important in their many different fields including virology, immunology, pharmacology, oncology, genetics, genetic engineering, biochemistry, physiology, and pathology. Protein structures contain important information required for understanding the causes of disease and developing a rational approach to treatment. These data are essential in the design of cures based on information macromolecules, which can be specific to the individual or to the particular type of virus, cancer, autoimmune disease, on inborn error of metabolism.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Biotechnology Resource Grants (P41)
Project #
5P41LM005206-09
Application #
2237716
Study Section
Special Emphasis Panel (SSS (F))
Project Start
1984-02-01
Project End
1995-01-31
Budget Start
1992-02-01
Budget End
1995-01-31
Support Year
9
Fiscal Year
1992
Total Cost
Indirect Cost
Name
National Biomedical Research Foundation
Department
Type
DUNS #
City
Washington
State
DC
Country
United States
Zip Code
20007
George, D G; Barker, W C; Mewes, H W et al. (1994) The PIR-International Protein Sequence Database. Nucleic Acids Res 22:3569-73
Heumann, K; George, D; Mewes, H W (1994) A new concept of sequence data distribution on wide area networks. Comput Appl Biosci 10:519-26
Ryden, L G; Hunt, L T (1993) Evolution of protein complexity: the blue copper-containing oxidases and related proteins. J Mol Evol 36:41-66
Barker, W C; George, D G; Mewes, H W et al. (1993) The PIR-International databases. Nucleic Acids Res 21:3089-92
Barker, W C; George, D G; Mewes, H W et al. (1992) The PIR-International Protein Sequence Database. Nucleic Acids Res 20 Suppl:2023-6
Garavelli, J S (1991) Molecular modeling on the Commodore Amiga. J Mol Graph 9:24-6, 36
Barker, W C; George, D G; Hunt, L T et al. (1991) The PIR protein sequence database. Nucleic Acids Res 19 Suppl:2231-36