The BLAST group has now incorporated our faster calculations for the Gumbel scale parameter, λ, pre-factor k, and finite-size correction into the BLAST code. The new finite-size correction demonstrably improves the retrieval order. In practice, e.g., biologists notice and find it irritating when an exact match to their query is not the highest-ranked hit in a sequence database. Our finite-size correction places identical matches more consistently at the top of the retrieval list than the old finite-size correction. Our methods are being used to compute the statistical parameters for several new DNA scoring schemes at NCBI, and we are collaborating with Dr. Martin Frith in extending our methods to next-generation sequence matching, including frameshifts in DNA.

Project Start
Project End
Budget Start
Budget End
Support Year
13
Fiscal Year
2011
Total Cost
$69,945
Indirect Cost
Name
National Library of Medicine
Department
Type
DUNS #
City
State
Country
Zip Code
Gauran, Iris Ivy M; Park, Junyong; Lim, Johan et al. (2018) Empirical null estimation using zero-inflated discrete mixture distributions and its application to protein domain data. Biometrics 74:458-471
Sheetlin, Sergey; Park, Yonil; Frith, Martin C et al. (2016) ALP & FALP: C++ libraries for pairwise local alignment E-values. Bioinformatics 32:304-5
Carroll, Hyrum D; Williams, Alex C; Davis, Anthony G et al. (2015) Improving Retrieval Efficacy of Homology Searches Using the False Discovery Rate. IEEE/ACM Trans Comput Biol Bioinform 12:531-7
Sheetlin, Sergey L; Park, Yonil; Frith, Martin C et al. (2014) Frameshift alignment: statistics and post-genomic applications. Bioinformatics :
Park, Yonil; Sheetlin, Sergey; Ma, Ning et al. (2012) New finite-size correction for local alignment score distributions. BMC Res Notes 5:286
Sheetlin, Sergey; Park, Yonil; Spouge, John L (2011) Objective method for estimating asymptotic parameters, with an application to sequence alignment. Phys Rev E Stat Nonlin Soft Matter Phys 84:031914
Park, Yonil; Sheetlin, Sergey; Spouge, John L (2009) ESTIMATING THE GUMBEL SCALE PARAMETER FOR LOCAL ALIGNMENT OF RANDOM SEQUENCES BY IMPORTANCE SAMPLING WITH STOPPING TIMES. Ann Stat 37:3697