This project had two central focuses this year: 1) The PSI-BLAST program greatly increases the sensitivity of protein-database similarity searches. It constructs a multiple alignment from significant similarities to a query sequence, derives a """"""""profile"""""""" from this alignment, and searches the database anew, using the profile as a query. We sought to accelerate this process by performing an initial search not to a standard sequence database, but rather to a database of pre-constructed protein profiles, such as the Conserved Domain Database (CDD), maintained by NCBI. When any hits were found, the profiles constructed after this fast initial search were, on average, as sensitive in finding sequences related to the query as were profiles constructed after multiple rounds of PSI-BLAST searching, requiring much more time. The program for this purpose, Domain Enhanced Lookup Time Accelerated BLAST (DELTA-BLAST) has been made available on the NCBI web site. 2) We continued to investigate the means for decreasing the number of false positive hits in PSI-BLAST searches by means of alignment trimming, to decrease the chance of profile corruption due to homologous over-extensions.
|Altschul, Stephen; Demchak, Barry; Durbin, Richard et al. (2013) The anatomy of successful computational biology software. Nat Biotechnol 31:894-7|
|Boratyn, Grzegorz M; Schaffer, Alejandro A; Agarwala, Richa et al. (2012) Domain enhanced lookup time accelerated BLAST. Biol Direct 7:12|
|Altschul, Stephen F; Gertz, E Michael; Agarwala, Richa et al. (2009) PSI-BLAST pseudocounts and the minimum description length principle. Nucleic Acids Res 37:815-24|