The NCBI CoreTools now contains code from us, code that calculates to practical accuracies, and in less than 1 sec, all parameters of the modified Gumbel distribution (the Gumbel scale parameter, λ, pre-factor k, and finite-size correction). The BLAST group plans to use our faster calculations to generate the modified Gumbel parameters for several new DNA scoring schemes. The BLAST group have also implemented the new finite-size correction directly into their code, demonstrably improving BLAST sequence retrieval. The implementation even had unexpected benefits, such as improved retrieval from the Conserved Domain Database with rps-BLAST. In addition, biologists notice and find it irritating when an exact match to their query is not the highest-ranked hit in a sequence database. The new finite-size correction places identical matches more consistently at the top of the retrieval list than the old finite-size correction. We are now collaborating with Dr. Martin Frith in extending our methods to next-generation sequence matching, including frameshifts in DNA.

Project Start
Project End
Budget Start
Budget End
Support Year
14
Fiscal Year
2012
Total Cost
$303,689
Indirect Cost
Name
National Library of Medicine
Department
Type
DUNS #
City
State
Country
Zip Code
Gauran, Iris Ivy M; Park, Junyong; Lim, Johan et al. (2018) Empirical null estimation using zero-inflated discrete mixture distributions and its application to protein domain data. Biometrics 74:458-471
Sheetlin, Sergey; Park, Yonil; Frith, Martin C et al. (2016) ALP & FALP: C++ libraries for pairwise local alignment E-values. Bioinformatics 32:304-5
Carroll, Hyrum D; Williams, Alex C; Davis, Anthony G et al. (2015) Improving Retrieval Efficacy of Homology Searches Using the False Discovery Rate. IEEE/ACM Trans Comput Biol Bioinform 12:531-7
Sheetlin, Sergey L; Park, Yonil; Frith, Martin C et al. (2014) Frameshift alignment: statistics and post-genomic applications. Bioinformatics :
Park, Yonil; Sheetlin, Sergey; Ma, Ning et al. (2012) New finite-size correction for local alignment score distributions. BMC Res Notes 5:286
Sheetlin, Sergey; Park, Yonil; Spouge, John L (2011) Objective method for estimating asymptotic parameters, with an application to sequence alignment. Phys Rev E Stat Nonlin Soft Matter Phys 84:031914
Park, Yonil; Sheetlin, Sergey; Spouge, John L (2009) ESTIMATING THE GUMBEL SCALE PARAMETER FOR LOCAL ALIGNMENT OF RANDOM SEQUENCES BY IMPORTANCE SAMPLING WITH STOPPING TIMES. Ann Stat 37:3697