NCBI currently uses the local alignment tool """"""""rps-BLAST"""""""" to seawrch its CDD (Conserved Domain Database). Local alignment tools are inherently inappropriate for CDD retrieval, because *complete* domains (by definition) are the units conserved in evolution. Thus, retrieval should compare complete domains to protein subsequences, which is """"""""semi-global"""""""" alignment. Accordingly, we developed a semi-global alignment method. Dr Sergey Sheetlin implemented our method in a program called """"""""GLOBAL"""""""". Dr Maricel Kann analyzed the retrieval efficacy of several competitive methods, including HMMer, an implementation of Hidden Markov models (HMMs), and shown that the retrieval efficacies are in the order HMMer (in global mode) = GLOBAL > rps-BLAST. GLOBAL is in fact a degenerate HMM. While retaining HMM retrieval efficacies, GLOBAL is as fast as local alignment methods and subject to acceleration by the same heuristics. The NCBI structure group is currently moving to implement GLOBAL as the primary CDD search engine.
Kann, Maricel G; Sheetlin, Sergey L; Park, Yonil et al. (2007) The identification of complete domains within protein sequences using accurate E-values for semi-global alignment. Nucleic Acids Res 35:4678-85 |