NCBI currently uses the local alignment tool """"""""rps-BLAST"""""""" to search the CDD. Local alignment tools are inherently inappropriate for CDD retrieval, because complete domains (by definition) are the units conserved in evolution. Thus, retrieval should compare complete domains to protein subsequences, which is """"""""semi-global"""""""" alignment. Accordingly, we developed a semi-global alignment algorithm. Dr Sergey Sheetlin implemented our method in a program called """"""""GLOBAL"""""""". Dr Maricel Kann analyzed the retrieval efficacy of several competitive methods, including HMMer, an implementation of Hidden Markov models (HMMs), and shown that the retrieval efficacies are in the order HMMer (in global mode) = GLOBAL > rps-BLAST. GLOBAL is in fact a degenerate HMM. While retaining HMM retrieval efficacies, GLOBAL is simple enough to be accelerated by the same heuristics used in local alignment methods like BLAST. To these ends, we have developed novel statistical approximations for semi-global alignment method that discovers whole protein domains within a query protein sequence, thereby giving clues as to the function of novel protein sequences.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Intramural Research (Z01)
Project #
1Z01LM091804-05
Application #
7735084
Study Section
Project Start
Project End
Budget Start
Budget End
Support Year
5
Fiscal Year
2008
Total Cost
$103,458
Indirect Cost
Name
National Library of Medicine
Department
Type
DUNS #
City
State
Country
United States
Zip Code
Kann, Maricel G; Sheetlin, Sergey L; Park, Yonil et al. (2007) The identification of complete domains within protein sequences using accurate E-values for semi-global alignment. Nucleic Acids Res 35:4678-85