Computer methods will be developed for the derivation of empirical amino acid similarity scoring matrices, and the effectiveness of using these matrices for the detection of evolutionary and functional relationships between protein sequences will be evaluated. It is well known that comparisons of protein sequences are much more sensitive when the similarity between specific amino acids is taken into consideration. A scoring matrix based on mutations compiled from protein sequence data available in 1978 has been found to be extremely useful for such comparisons; it represents evolutionarily accepted amino acid replacement frequencies and clearly reflects the physicochemical properties of the amino acids. This matrix is now routinely employed in many sequence comparison and database searching methods and it has been subjected to analysis to answer a number of evolutionary questions. Sequence comparison methods using this matrix have recently led to significant advances in cancer research and are being employed in may other disease-related areas, including AIDS research. The drastic increase in the amount of protein sequence data now available clearly warrants the recompilation of the mutation data matrix. The variations of the matrix elements among different taxonomic groups will also be examined. It has become increasingly apparent in recent years that the study of functional relationships between proteins, irrespective of evolutionary considerations, is also relevant to the understanding of mechanisms involved in the production of diseased states. The methodology used to compile mutation data matrices will be generalized to compile scoring matrices more closely reflecting purely functional requirements. The matrices compiled as a result of this work are expected to provide a more sensitive probe for the study of functional and evolutionary relationships between proteins. All software and data generated in this work will be made available to the general scientific community through the Protein Identification Resource.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM037273-02
Application #
3292542
Study Section
(SSS)
Project Start
1987-08-01
Project End
1990-07-31
Budget Start
1988-08-01
Budget End
1989-07-31
Support Year
2
Fiscal Year
1988
Total Cost
Indirect Cost
Name
National Biomedical Research Foundation
Department
Type
DUNS #
City
Washington
State
DC
Country
United States
Zip Code
20007
George, D G; Barker, W C; Hunt, L T (1990) Mutation data matrix and its uses. Methods Enzymol 183:333-51