New approaches to testing the effectiveness of retrieval methods have been studied. A method of testing retrieval performance which involves the comparison of statistically independent retrieval methods has been developed. A second method of testing retrieval based on modelling the document collection and the relevance relation has been investigated and compared with the previous method. This second method involves the hypergeometic probability distribution and yields results quite consistent with the first. A paper has been published describing this modelling method. A new measure of retrieval performance based on information theory has been discovered. This measure is simple to apply and calculates the number of bits of information produced by a ranked retrieval method in concentrating relevant documents in the first n ranks in the retrieval operation. It has many intuitively desirable properties and agrees with precision-recall curves when the latter allow the unambiguous comparison of different methods of retrieval. A paper has been published describing the methodology. Studies have been performed to evaluate the sensitivity of retrieval testing to the number of queries in a test set and to the particular measure used. Several of the classical test sets (CRAN, CISI, and MED, etc.) as well as a test subset of Medline in the area of molecular biology have been studied. Special attention has been paid to the problem of large databases where only incomplete sampling is possible.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Intramural Research (Z01)
Project #
1Z01LM000020-02
Application #
3781265
Study Section
Project Start
Project End
Budget Start
Budget End
Support Year
2
Fiscal Year
1993
Total Cost
Indirect Cost
Name
National Library of Medicine
Department
Type
DUNS #
City
State
Country
United States
Zip Code