The NCBI CoreTools now contains code from us, code that calculates to practical accuracies, and in less than 1 sec, all parameters of the modified Gumbel distribution (the Gumbel scale parameter, , pre-factor k, and finite-size correction). The BLAST group plans has used our faster calculations to generate the modified Gumbel parameters for several new DNA scoring schemes. Our collaboration with Dr. Martin Frith has extended our methods to next-generation sequence matching, including frameshifts in DNA, a subject of relevance to the NCBI BLAST services. In a practical test of our methods, our frameshift statistics found many novel human pseudogenes.

Project Start
Project End
Budget Start
Budget End
Support Year
15
Fiscal Year
2013
Total Cost
$267,102
Indirect Cost
Name
National Library of Medicine
Department
Type
DUNS #
City
State
Country
Zip Code
Spouge, John L; Mariño-Ramírez, Leonardo; Sheetlin, Sergey L (2014) Searching for repeats, as an example of using the generalised Ruzzo-Tompa algorithm to find optimal subsequences with gaps. Int J Bioinform Res Appl 10:384-408
Sheetlin, Sergey L; Park, Yonil; Frith, Martin C et al. (2014) Frameshift alignment: statistics and post-genomic applications. Bioinformatics :