NCBI currently uses the local alignment tool """"""""rps-BLAST"""""""" to seawrch its CDD (Conserved Domain Database). Local alignment tools are inherently inappropriate for CDD retrieval, because *complete* domains (by definition) are the units conserved in evolution. Thus, retrieval should compare complete domains to protein subsequences, which is """"""""semi-global"""""""" alignment. Accordingly, we developed a semi-global alignment method. Dr Sergey Sheetlin implemented our method in a program called """"""""GLOBAL"""""""". Dr Maricel Kann analyzed the retrieval efficacy of several competitive methods, including HMMer, an implementation of Hidden Markov models (HMMs), and shown that the retrieval efficacies are in the order HMMer (in global mode) = GLOBAL > rps-BLAST. GLOBAL is in fact a degenerate HMM. While retaining HMM retrieval efficacies, GLOBAL is as fast as local alignment methods and subject to acceleration by the same heuristics. The NCBI structure group is currently moving to implement GLOBAL as the primary CDD search engine.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Intramural Research (Z01)
Project #
1Z01LM091804-03
Application #
7316281
Study Section
(CBB)
Project Start
Project End
Budget Start
Budget End
Support Year
3
Fiscal Year
2006
Total Cost
Indirect Cost
Name
National Library of Medicine
Department
Type
DUNS #
City
State
Country
United States
Zip Code
Kann, Maricel G; Sheetlin, Sergey L; Park, Yonil et al. (2007) The identification of complete domains within protein sequences using accurate E-values for semi-global alignment. Nucleic Acids Res 35:4678-85