With the growth of biological information, the efficiency of database retrieval has become central to the biological enterprise. In particular, one can change a retrieval method and must be able to evaluate whether the change is an improvement or not. We are developing methods based on the statistical bootstrap to assign statistical significance to improvements in database retrieval. The methods are based on mathematical central limit theorems describing the behavior of the receiver operating characteristic curve [n] under bootstrapping. In particular, our methodology has already been applied to determine which changes to the PSI-BLAST program actually constitute improvements. In addition, we are investigating """"""""isotonicity"""""""" of relevance in retrieval, the assumption that after rankwise averaging of relevance, records are retrieved on average in decreasing order of relevance. The isotonic assumption affects the evaluation of retrieval efficiency, and preliminary results indicate that despite its widespread adoption, the assumption can be wrong.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Intramural Research (Z01)
Project #
1Z01LM000094-03
Application #
6843598
Study Section
(CBB)
Project Start
Project End
Budget Start
Budget End
Support Year
3
Fiscal Year
2003
Total Cost
Indirect Cost
Name
National Library of Medicine
Department
Type
DUNS #
City
State
Country
United States
Zip Code
Schaffer, A A; Aravind, L; Madden, T L et al. (2001) Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. Nucleic Acids Res 29:2994-3005