With the growth of biological information, the efficiency of database retrieval has become central to the biological enterprise. In particular, one can change a retrieval method and must be able to evaluate whether the change is an improvement or not. Initially, using U-statistics, we developed central limit theorems describing the behavior of the receiver operating characteristic curve n (ROCn) under bootstrapping. Our methodology was applied to determine which changes to the PSI-BLAST program actually constitute improvements. Eventually, however, we rejected the ROCn as an unacceptable measure of database retrieval efficacy for bioinformatics, substituting in its place the TAPk. Citation databases show that the TAPk is receiving attention in bioinformatics, particularly in text retrieval. Drs. Spouge and Carroll have applied the TAPk to evaluate variations in BLAST retrieval replacing E-values with false discovery rates (FDRs). The results show that FDRs improve the retrieval of PSI-BLAST, especially in small protein families.
Carroll, Hyrum D; Williams, Alex C; Davis, Anthony G et al. (2015) Improving Retrieval Efficacy of Homology Searches Using the False Discovery Rate. IEEE/ACM Trans Comput Biol Bioinform 12:531-7 |
Carroll, Hyrum D; Kann, Maricel G; Sheetlin, Sergey L et al. (2010) Threshold Average Precision (TAP-k): a measure of retrieval designed for bioinformatics. Bioinformatics 26:1708-13 |