A problem that occasionally arises in PSI-BLAST searches is the """"""""corruption"""""""" of the evolving sequence profile through the inclusion of non-homologous sequences in the PSI-BLAST multiple alignment. In previous years, corruption has been greatly reduced through the improvement of PSI-BLAST statistics, most importantly by accounting for non-standard sequence composition. Recently, however, it has been observed in the literature that PSI-BLAST profiles may become corrupted through """"""""homologous over-extension"""""""", a problem that can not be remedied by improved statistics. In brief, this problem arises when the boundaries of an otherwise """"""""true"""""""" alignment are miscalculated, yielding the alignment longer than it should be. If such an alignment extends into a domain in the subject sequence that occurs widely in the database, subsequent PSI-BLAST iterations can, in a ratchet-like manner, come to include the complete domain, even though it does not exist in the query sequence. The problem is due not to faulty statistics, but to faulty alignment. One solution to this problem has been proposed in the literature, but we have adopted what we consider a better remedy, based upon trimming alignments at each end by a certain number of bits. This year we have continued to test and refine our approach. By standard pooled ROC-n measures, we have achieved results better than the baseline PSI-BLAST program. However, analysis suggest that further improvement is possible with an approach that analyzes multiple PSI-BLAST hits simultaneously. Development of this method continues. No publications have yet resulted.

Project Start
Project End
Budget Start
Budget End
Support Year
16
Fiscal Year
2011
Total Cost
$59,961
Indirect Cost
Name
National Library of Medicine
Department
Type
DUNS #
City
State
Country
Zip Code
Shah, Nidhi; Altschul, Stephen F; Pop, Mihai (2018) Outlier detection in BLAST hits. Algorithms Mol Biol 13:7
Altschul, Stephen; Demchak, Barry; Durbin, Richard et al. (2013) The anatomy of successful computational biology software. Nat Biotechnol 31:894-7
Boratyn, Grzegorz M; Schaffer, Alejandro A; Agarwala, Richa et al. (2012) Domain enhanced lookup time accelerated BLAST. Biol Direct 7:12
Altschul, Stephen F; Gertz, E Michael; Agarwala, Richa et al. (2009) PSI-BLAST pseudocounts and the minimum description length principle. Nucleic Acids Res 37:815-24