A method for detecting conserved motifs in protein sequence databases and assessing their statistical significance was developed and implemented in the CAP (Conssitent Alignment Parser) and MoST (Motif Search Tool) programs. The MoST procedure consists of iteratively abstracting from an alignment block a weight matrix representing the conserved motif, scanning the database with this matrix, and locating new segments to add to the alignment block. The last step is based upon the statistics of score distributions for position-dependent weight matrices. This techniques was applied to the analysis of several protein classes. We showed that eukaryotic translation elongation factor EF1g contains a domain related to glutathione S-transferases (GST). Two motifs that are conserved in a vast class of GST-related proteins are defined and a possible role for the GST activity of EF1g in the assembly of protein complexes involved in translation is proposed. We found that human tumor-specific nucleolar protein P120 contains a conserved S-adenosylmethionine-binding motif, and belongs to a family of putative rRNA methyltransferases that may be involved in the control of proliferation of both eukaryotic and bacterial cells. We explored the evolution of bacterial hydrolytic dehalogenases, which are crucial for detoxification of xenobiotics. Two types of dehalogenases were shown to belong to large, distinct superfamilies of enzymes found in all organisms, each of which includes many previously uncharacterized proteins. One of these superfamilies contains transferases and oxidoreductases, in addition to hydrolases, thereby revealing an evolutionary connection between different enzyme classes. Two new superfamilies of nucleosidases were characterized, as well as an unexpected structural and evolutionary relationship between thymidine phosphorylases and anthranilate phosphoribosyltransferases. We conclude that the use of motifs, particularly in the form of position-dependent weight matrices derived from alignment blocks, for sequence database screening results in extracting of a significant amount of new information as compared to standard procedures for pairwise similarity search. A general biological conclusion is that enzyme evolution involves a complex interplay between divergence and functional convergence. In many instances, evolutionarily related enzymes catalyze different reactions, whereas enzymes of similar specificity frequently appear to have different origins.
Showing the most recent 10 out of 50 publications