Historically, in computational biology the fast Fourier transform (FFT) has been used almost exclusively to count the number of exact letter matches between two biosequences. In contrast, this project explores an FFT algorithm that can compute the match score of a sequence against a position-specific scoring matrix (PSSM). The algorithm finds the PSSM score simultaneously over all offsets of the PSSM with the sequence, although like all previous FFT algorithms, it still disallows gaps. Although the algorithm is presented in the context of global matching, it can be adapted to local matching without gaps. As a benchmark, the PSSM-modified FFT algorithm computed pairwise match scores. In timing experiments, the most efficient FFT implementation for pairwise scoring appeared to be 10 to 26 faster than a traditional FFT implementation, with only a factor of 2 in the acceleration attributable to a previously known compression scheme. Many important algorithms for detecting biosequence similarities, e.g., gapped BLAST or PSIBLAST, have a heuristic screening phase that disallows gaps.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Intramural Research (Z01)
Project #
1Z01LM000091-03
Application #
6681384
Study Section
(CBB)
Project Start
Project End
Budget Start
Budget End
Support Year
3
Fiscal Year
2002
Total Cost
Indirect Cost
Name
National Library of Medicine
Department
Type
DUNS #
City
State
Country
United States
Zip Code
Rajasekaran, S; Jin, X; Spouge, J L (2002) The efficient computation of position-specific match scores with the fast fourier transform. J Comput Biol 9:23-33