The NCBI CoreTools now contains code from us, code that calculates to practical accuracies, and in less than 1 sec, all parameters of the modified Gumbel distribution (the Gumbel scale parameter, , pre-factor k, and finite-size correction). The BLAST group plans has used our faster calculations to generate the modified Gumbel parameters for several new DNA scoring schemes. Our collaboration with Dr. Martin Frith has extended our methods to next-generation sequence matching, including frameshifts in DNA, a subject of relevance to the NCBI BLAST services. In a practical test of our methods, our frameshift statistics found many novel human pseudogenes.
|Spouge, John L; Mariño-Ramírez, Leonardo; Sheetlin, Sergey L (2014) Searching for repeats, as an example of using the generalised Ruzzo-Tompa algorithm to find optimal subsequences with gaps. Int J Bioinform Res Appl 10:384-408|
|Sheetlin, Sergey L; Park, Yonil; Frith, Martin C et al. (2014) Frameshift alignment: statistics and post-genomic applications. Bioinformatics :|