This project's aim is to develop a method to identiy patterns which represent the structural correlates of protein functional domains. These will be constructed from protein primary sequences annotated with structural inferences derived from the primary sequences. The basis for the approach is that common functions generally correlate with common protein structures, domains and/or regions of invariant or equivalent amino acids. This is true even for functionally related proteins with very different primary sequences. The proposed method involves comparative analysis of sets of functionally related proteins for a pattern consisting of elements of the common structure, invariant amino acids and other properties which can be predicted statistically from the primary sequence. The approach utilizes the input from the disciplines of molecular genetic, biochemistry, and computer science. The project: will begin by extending newly developed methods, which have been proven successful upon initial application; and will culminate with the generation of a pattern-indexed library of protein functional domain pattern descriptors. This will be coupled with the software development required for their identification in new sequences. The generation of this library will thus aid in the identification of the function(s) and domain substructure of newly sequenced DNA coding regions, which will be inmportant given the ease and anticipated rate of genome sequencing in the near future. Advances in biotechnology have led to the determination of the primary structure, or ordering of amino acids, of many types of proteins. A computer based pattern recognition program would analyze known primary structures to find common patterns, and this information could then be applied to the study of newly determined sequences and to the design of new proteins with specific functions. This capability would increase fundamental knowledge in biology and lead to improved methods in biotechnology.

Agency
National Science Foundation (NSF)
Institute
Division of Biological Infrastructure (DBI)
Application #
8715633
Program Officer
Gerald Selzer
Project Start
Project End
Budget Start
1988-09-01
Budget End
1991-09-01
Support Year
Fiscal Year
1987
Total Cost
$496,497
Indirect Cost
Name
Dana-Farber Cancer Institute
Department
Type
DUNS #
City
Boston
State
MA
Country
United States
Zip Code
02215