New approaches to testing the effectiveness of retrieval methods have been developed. Two new methods of rating the effectiveness of statistical methods of retrieval have been developed which do not require human judgments for their application. Both methods rely on the hypergeometric distribution of probability theory. One method involves the use of two statistically independent methods whose results are compared and because of the independence allow one to draw conclusions about the absolute performance of the method under study. Examples of such statistically independent pairs of methods are constructed. The second method involves the construction of an abstract model of a real database in which the relevance relation is modelled so that retrieval may be tested. A new measure of retrieval performance based on information theory has been discovered. This measure is simple to apply and calculates the number of bits of information produced by a ranked retrieval method in concentrating relevant documents in the first n ranks in the retrieval operation. It has many intuitively desirable properties and agrees with precision-recall curves when the latter allow the unambiguous comparison of different methods of retrieval. Studies are under way to evaluate the sensitivity of retrieval testing to the number of queries in a test set and to the particular measure used. Several of the classical test sets (CRAN, CISI, and MED, etc.) as well as a test subset of Medline in the area of molecular biology are under study.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Intramural Research (Z01)
Project #
1Z01LM000020-01
Application #
3845110
Study Section
Project Start
Project End
Budget Start
Budget End
Support Year
1
Fiscal Year
1992
Total Cost
Indirect Cost
Name
National Library of Medicine
Department
Type
DUNS #
City
State
Country
United States
Zip Code