We heuristically derived two new equations for the scale parameter . This equation can estimate nefficiently with high accuracy. In addition, we have proposed several new formulas for Gumbel pre-factor k based on a path reversal identity and the Poisson clumping heuristic. This formula also provides very accurate results. We also have explored edge effects on the statistics. Edge effects are relevant, because real sequences have limited lengths, generating a correction term in an asymptotic expansion of the probability of sequence matching. This edge effect is likely to be more important in the statistics of matching with gaps than it was in the statistics of matching without gaps, because gapped matches tend to be longer, exhausting the sequences being matched more easily. The NCBI CoreTools now has code that calculates all the modified Gumbel parameters to practical accuracies in less than 1 sec.
Spouge, John L (2007) Inequalities on the Overshoot beyond a Boundary for Independent Summands with Differing Distributions. Stat Probab Lett 77:1486-1489 |
Sheetlin, Sergey; Park, Yonil; Spouge, John L (2005) The Gumbel pre-factor k for gapped local alignment can be estimated from simulations of global alignment. Nucleic Acids Res 33:4987-94 |
Frith, Martin C; Spouge, John L; Hansen, Ulla et al. (2002) Statistical significance of clusters of motifs represented by position specific scoring matrices in nucleotide sequences. Nucleic Acids Res 30:3214-24 |
Park, Yonil; Spouge, John L (2002) The correlation error and finite-size correction in an ungapped sequence alignment. Bioinformatics 18:1236-42 |
Makalowska, I; Ferlanti, E S; Baxevanis, A D et al. (1999) Histone Sequence Database: sequences, structures, post-translational modifications and genetic loci. Nucleic Acids Res 27:323-4 |
Wolfsberg, T G; Makalowska, I; Makalowski, W (1999) Genomes and evolution. Web alert. Curr Opin Genet Dev 9:619 |