In the last few years, rapid accumulation of genome sequences and protein structures has been paralleled by major advances in sequence database search methods. The powerful Position-Specific Iterating BLAST (PSI-BLAST) method developed at the NCBI formed the basis of our work on protein motif analysis. A new mode of PSI-BLAST application which includes exhaustive database search by repeating PSI-BLAST iterations to convergence with newly identified protein family members was developed and implemented in an automatic procedure. Another new procedure, IMPALA, is a reversal of the PSI-BLAST method and allows one to search a library of protein family profiles by using an individual protein sequence as a query. These methods were applied to the systematic analysis of several classes of protein domains. It was shown that a number of signaling domains previously considered to be specifically eukaryotic are detectable in archaea and/or bacteria. By combining domain detection with a cross-genome comparison, these domains were classified into ancestral and horizontally transferred ones. The evolutionary histories of protein domains that comprise the repair systems and programmed cell death systems were investigated in detail. Also, the DNA-binding domains encoded in archaeal genomes have been thoroughly studied resulting in the demonstration that the repertoire of such domains in archaea resembles that in bacteria but not in eukaryotes. A number of previously undetected domains and protein families were discovered including the ACT domain ? multipurpose ligand-binding model involved in allosteric regulation of avariety of enzymes and a superfamily of predicted protease from bacteria, archaea and eukaryotes that are homologous to animal transglutaminases. - Protein sequence motifs, iterative database search, fold recognition, multiple alignment
Showing the most recent 10 out of 50 publications