The NCBI CoreTools now contains code from us, code that calculates to practical accuracies, and in less than 1 sec, all parameters of the modified Gumbel distribution (the Gumbel scale parameter, λ, pre-factor k, and finite-size correction). The BLAST group plans to use our faster calculations to generate the modified Gumbel parameters for several new DNA scoring schemes. The BLAST group have also implemented the new finite-size correction directly into their code, demonstrably improving BLAST sequence retrieval. The implementation even had unexpected benefits, such as improved retrieval from the Conserved Domain Database with rps-BLAST. In addition, biologists notice and find it irritating when an exact match to their query is not the highest-ranked hit in a sequence database. The new finite-size correction places identical matches more consistently at the top of the retrieval list than the old finite-size correction. We are now collaborating with Dr. Martin Frith in extending our methods to next-generation sequence matching, including frameshifts in DNA.
Gauran, Iris Ivy M; Park, Junyong; Lim, Johan et al. (2018) Empirical null estimation using zero-inflated discrete mixture distributions and its application to protein domain data. Biometrics 74:458-471 |
Sheetlin, Sergey; Park, Yonil; Frith, Martin C et al. (2016) ALP & FALP: C++ libraries for pairwise local alignment E-values. Bioinformatics 32:304-5 |
Carroll, Hyrum D; Williams, Alex C; Davis, Anthony G et al. (2015) Improving Retrieval Efficacy of Homology Searches Using the False Discovery Rate. IEEE/ACM Trans Comput Biol Bioinform 12:531-7 |
Sheetlin, Sergey L; Park, Yonil; Frith, Martin C et al. (2014) Frameshift alignment: statistics and post-genomic applications. Bioinformatics : |
Park, Yonil; Sheetlin, Sergey; Ma, Ning et al. (2012) New finite-size correction for local alignment score distributions. BMC Res Notes 5:286 |
Sheetlin, Sergey; Park, Yonil; Spouge, John L (2011) Objective method for estimating asymptotic parameters, with an application to sequence alignment. Phys Rev E Stat Nonlin Soft Matter Phys 84:031914 |
Park, Yonil; Sheetlin, Sergey; Spouge, John L (2009) ESTIMATING THE GUMBEL SCALE PARAMETER FOR LOCAL ALIGNMENT OF RANDOM SEQUENCES BY IMPORTANCE SAMPLING WITH STOPPING TIMES. Ann Stat 37:3697 |