Funds are sought to continue a computer-based study of protein evolution. The major goal is to establish the evolutionary roots of contemporary proteins. To this end, newly appearing protein sequences are entered into a computer and searches conducted against up-to-date sequence data bases for potentially related sequences. Candidate matches are subjected to appropriate statistical measures to assess the likelihood that similarities are not due to chance. Matched sequences are clustered into families and evolutionary trees constructed. During the past several years a large number of unexpected relationships have been uncovered, by us and many others, and at this point it is clear that the number of protein prototypes is a manageable number. Even with only 5,000 protein sequences in the banks, and even with a high degree of reduncance in the data bases (i.e., the same proteins from many species), the chances are now better than even that any newly determined sequences from a mammalian organism will be found to resemble a sequence already in the collection. In this regard, of the last 40 sequences entered into our collection, more than half resemble already reported proteins, species redundancies aside. This implies that we are already in a good position for classifying sequences hierarchically with regard to their origins. We are especially interested in categorizing vertebrate blood plasma proteins, many of which are the result of a certain amount of """"""""exon shuffling."""""""" This is a phenomenon that presents certain technical problems to the sequence-comparer, in that the lengths of interchanged segments are often only 40-50 residues long. The proposal considers the need for simple procedures for recognizing shuffled exons, as well as other approaches designed to correct for structural biases that occasionally make unrelated sequences appear similar. The need for simple procedures that can recognize relationships among proteins on the basis of their amino acid sequences alone will increasingly be felt as large-scale sequencing projects (the complete human genome, for example) are undertaken.
Showing the most recent 10 out of 14 publications