With the growth of biological information, the efficiency of database retrieval has become central to the biological enterprise. In particular, one can change a retrieval method and must be able to evaluate whether the change is an improvement or not. Initially, using U-statistics, we developed central limit theorems describing the behavior of the receiver operating characteristic curve n (ROCn) under bootstrapping. Our methodology was applied to determine which changes to the PSI-BLAST program actually constitute improvements. Eventually, however, we rejected the ROCn as an unacceptable measure of database retrieval efficacy for bioinformatics, substituting in its place the TAPk. By measuring the retrieval efficacy for each query (which the ROCn cannot always do), the TAPk permits metrics on retrieval methods, to determine how closely related two retrieval methods are by their behavior query by query. The metrics can distinguish, e.g., a """"""""tweak"""""""" on an accepted retrieval algorithm (which produces retrieval """"""""close"""""""" to the algorithm's) from a truly novel algorithm (which produces a """"""""distant"""""""" retrieval), thereby rewarding originality in bioinformatics, by objectively displaying incremental improvements of existing algorithms for what they are.

Project Start
Project End
Budget Start
Budget End
Support Year
9
Fiscal Year
2011
Total Cost
$19,987
Indirect Cost
Name
National Library of Medicine
Department
Type
DUNS #
City
State
Country
Zip Code
Carroll, Hyrum D; Williams, Alex C; Davis, Anthony G et al. (2015) Improving Retrieval Efficacy of Homology Searches Using the False Discovery Rate. IEEE/ACM Trans Comput Biol Bioinform 12:531-7
Carroll, Hyrum D; Kann, Maricel G; Sheetlin, Sergey L et al. (2010) Threshold Average Precision (TAP-k): a measure of retrieval designed for bioinformatics. Bioinformatics 26:1708-13