Evaluating information retrieval systems, such as search engines, is critical to their effective development. Current performance evaluation methodologies are generally variants of the Cranfield paradigm, which relies on effectively complete, and thus prohibitively expensive, relevance judgment sets: tens to hundreds of thousands of documents must be judged by human assessors for relevance with respect to dozens to hundreds of user queries, at great cost both in time and expense. This exploratory project investigates a new alternative to information retrieval evaluation paradigm -- based on "nuggets". "Nuggets" are atomic units of relevant information, and one instantiation of these nuggets is simply the sentence or short passage that causes a judge to deem a document relevant at the time of document assessment. The hypothesis is that while it is likely impossible to find all relevant documents for a query with respect to web-scale and/or dynamic collections, it is much more tractable to find all or nearly all nuggets (i.e., relevant information), with which one can then perform effective and reusable evaluation, at scale and with ease. At evaluation time, relevance assessments are dynamically created for documents based on the quantity and quality of relevant information found in the documents retrieved. This new evaluation paradigm is inherently scalable and permits the use of all standard measures of retrieval performance, including those involving graded relevance judgments, novelty, diversity, and so on; it further permits new kinds of evaluations not heretofore possible.

The project plan includes the development and release of nugget-based evaluation data sets for use by academia and industry. In fostering this effort, the project team has close ties with the US National Institute of Standards and Technology (NIST) and the Japanese National Institute of Informatics (through NTCIR), two of the premier organizations that develop and release information retrieval data sets. All research results and data sets developed as part of this project are available at the project website (www.ccs.neu.edu/home/jaa/IIS-1256172/). The project also provides educational and training experience for students and the development of curricular materials based on the project results.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
1256172
Program Officer
Maria Zemankova
Project Start
Project End
Budget Start
2012-09-15
Budget End
2015-02-28
Support Year
Fiscal Year
2012
Total Cost
$150,000
Indirect Cost
Name
Northeastern University
Department
Type
DUNS #
City
Boston
State
MA
Country
United States
Zip Code
02115