This project maintains and enhances existing community software infrastructure, and creates new community data infrastructure to enable the information retrieval research community and related research communities to conduct research on a "web scale", meaning datasets of a billion or more web pages together with large query logs. The software infrastructure is based on the Lemur Toolkit and the associated Indri search engine, which are used by many information retrieval researchers due to the support for multiple retrieval models, multiple forms of evidence, and a powerful probabilistic query language. The enhancements to Lemur include support for the popular MapReduce style of distributed processing and other efficiency improvements to make it practical to do research on large web datasets 'out of the box' in common computer hardware environments.

The new data infrastructure consists of maintenance and distribution of a newly created billion-page dataset, another new web dataset, and large, anonymized search logs that match the datasets. The combination of large datasets and corresponding large search logs enable a broad community to conduct research with more realistic data resources than were available previously. This research will lead to further advances in the understanding of the underlying issues for large-scale, personalized search, which will be an important part of the next generation of search engines.

For further information, see the project web site at the URL: www.lemurproject.org.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Network Systems (CNS)
Application #
0934322
Program Officer
Vasant G. Honavar
Project Start
Project End
Budget Start
2010-06-01
Budget End
2014-05-31
Support Year
Fiscal Year
2009
Total Cost
$530,000
Indirect Cost
Name
University of Massachusetts Amherst
Department
Type
DUNS #
City
Amherst
State
MA
Country
United States
Zip Code
01003