A continuing focus of the project this year was the investigation of ways to improve the retrieval accuracy of DELTA-BLAST through the use of """"""""model surgery"""""""" and of asymmetric but uniform gap costs. Traditionally, PSI-BLAST constructs its position specific score matrix (PSSM) using the query sequence as a template, with each amino acid serving as a place holder for a column of the matrix. However, if the query sequence includes an atypical insertion or deletion, the resulting PSSM will be handicapped in having to imply a corresponding deletion or insertion when aligning to most related sequences. The recently developed DELTA-BLAST first aligns a query sequence to a database of PSSMs, and this opens the possibility of allowing the constucted PSSM to take its length from any aligned PSSMs rather than from the query. Furthermore, it is possible to treat insertions and deletions with respect to PSSMs constructed using such model surgery asymmetrically, for example penalizing insertions less than deletions. We achieved statistically significant improvements using this approach. An updated article on the BLAST algorithm and programs was written for the Encyclopedia of Life Sciences.

Project Start
Project End
Budget Start
Budget End
Support Year
19
Fiscal Year
2014
Total Cost
Indirect Cost
Name
National Library of Medicine
Department
Type
DUNS #
City
State
Country
Zip Code
Shah, Nidhi; Altschul, Stephen F; Pop, Mihai (2018) Outlier detection in BLAST hits. Algorithms Mol Biol 13:7
Altschul, Stephen; Demchak, Barry; Durbin, Richard et al. (2013) The anatomy of successful computational biology software. Nat Biotechnol 31:894-7
Boratyn, Grzegorz M; Schaffer, Alejandro A; Agarwala, Richa et al. (2012) Domain enhanced lookup time accelerated BLAST. Biol Direct 7:12
Altschul, Stephen F; Gertz, E Michael; Agarwala, Richa et al. (2009) PSI-BLAST pseudocounts and the minimum description length principle. Nucleic Acids Res 37:815-24