The BLAST group has now incorporated our faster calculations for the Gumbel scale parameter, λ, pre-factor k, and finite-size correction into the BLAST code. The new finite-size correction demonstrably improves the retrieval order. In practice, e.g., biologists notice and find it irritating when an exact match to their query is not the highest-ranked hit in a sequence database. Our finite-size correction places identical matches more consistently at the top of the retrieval list than the old finite-size correction. Our methods are being used to compute the statistical parameters for several new DNA scoring schemes at NCBI, and we are collaborating with Dr. Martin Frith in extending our methods to next-generation sequence matching, including frameshifts in DNA.
|Gauran, Iris Ivy M; Park, Junyong; Lim, Johan et al. (2018) Empirical null estimation using zero-inflated discrete mixture distributions and its application to protein domain data. Biometrics 74:458-471|
|Sheetlin, Sergey; Park, Yonil; Frith, Martin C et al. (2016) ALP & FALP: C++ libraries for pairwise local alignment E-values. Bioinformatics 32:304-5|
|Carroll, Hyrum D; Williams, Alex C; Davis, Anthony G et al. (2015) Improving Retrieval Efficacy of Homology Searches Using the False Discovery Rate. IEEE/ACM Trans Comput Biol Bioinform 12:531-7|
|Sheetlin, Sergey L; Park, Yonil; Frith, Martin C et al. (2014) Frameshift alignment: statistics and post-genomic applications. Bioinformatics :|
|Park, Yonil; Sheetlin, Sergey; Ma, Ning et al. (2012) New finite-size correction for local alignment score distributions. BMC Res Notes 5:286|
|Sheetlin, Sergey; Park, Yonil; Spouge, John L (2011) Objective method for estimating asymptotic parameters, with an application to sequence alignment. Phys Rev E Stat Nonlin Soft Matter Phys 84:031914|
|Park, Yonil; Sheetlin, Sergey; Spouge, John L (2009) ESTIMATING THE GUMBEL SCALE PARAMETER FOR LOCAL ALIGNMENT OF RANDOM SEQUENCES BY IMPORTANCE SAMPLING WITH STOPPING TIMES. Ann Stat 37:3697|