Historically, in computational biology the fast Fourier transform (FFT) has been used almost exclusively to count the number of exact letter matches between two biosequences. In contrast, this project explores an FFT algorithm that can compute the match score of a sequence against a position-specific scoring matrix (PSSM). The algorithm finds the PSSM score simultaneously over all offsets of the PSSM with the sequence, although like all previous FFT algorithms, it still disallows gaps. Although the algorithm is presented in the context of global matching, it can be adapted to local matching without gaps. As a benchmark, the PSSM-modified FFT algorithm computed pairwise match scores. In timing experiments, the most efficient FFT implementation for pairwise scoring appeared to be 10 to 26 faster than a traditional FFT implementation, with only a factor of 2 in the acceleration attributable to a previously known compression scheme. Many important algorithms for detecting biosequence similarities, e.g., gapped BLAST or PSIBLAST, have a heuristic screening phase that disallows gaps. We are currently exploring FFT algorithms in these screening contexts
Rajasekaran, S; Jin, X; Spouge, J L (2002) The efficient computation of position-specific match scores with the fast fourier transform. J Comput Biol 9:23-33 |