A problem that occasionally arises in PSI-BLAST searches is the """"""""corruption"""""""" of the evolving sequence profile through the inclusion of non-homologous sequences in the PSI-BLAST multiple alignment. In previous years, corruption has been greatly reduced through the improvement of PSI-BLAST statistics, most importantly by accounting for non-standard sequence composition. Recently, however, it has been observed in the literature that PSI-BLAST profiles may become corrupted through """"""""homologous over-extension"""""""", a problem that can not be remedied by improved statistics. In brief, this problem arises when the boundaries of an otherwise """"""""true"""""""" alignment are miscalculated, yielding the alignment longer than it should be. If such an alignment extends into a domain in the subject sequence that occurs widely in the database, subsequent PSI-BLAST iterations can, in a ratchet-like manner, come to include the complete domain, even though it does not exist in the query sequence. The problem is due not to faulty statistics, but to faulty alignment. One solution to this problem has been proposed in the literature, but we have adopted what we consider a better remedy, based upon trimming alignments at each end by a certain number of bits. Preliminary tests of this approach have been promising, and development of this method continues. No publications have yet resulted.

Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
National Library of Medicine
Zip Code
Shah, Nidhi; Altschul, Stephen F; Pop, Mihai (2018) Outlier detection in BLAST hits. Algorithms Mol Biol 13:7
Altschul, Stephen; Demchak, Barry; Durbin, Richard et al. (2013) The anatomy of successful computational biology software. Nat Biotechnol 31:894-7
Boratyn, Grzegorz M; Schaffer, Alejandro A; Agarwala, Richa et al. (2012) Domain enhanced lookup time accelerated BLAST. Biol Direct 7:12
Altschul, Stephen F; Gertz, E Michael; Agarwala, Richa et al. (2009) PSI-BLAST pseudocounts and the minimum description length principle. Nucleic Acids Res 37:815-24