CI-ADDO-EN: Collaborative Proposal: Supporting Web-Scale Experimentation using the Lemur Toolkit

Croft, W. Bruce

Abstract

This project maintains and enhances existing community software infrastructure, and creates new community data infrastructure to enable the information retrieval research community and related research communities to conduct research on a "web scale", meaning datasets of a billion or more web pages together with large query logs. The software infrastructure is based on the Lemur Toolkit and the associated Indri search engine, which are used by many information retrieval researchers due to the support for multiple retrieval models, multiple forms of evidence, and a powerful probabilistic query language. The enhancements to Lemur include support for the popular MapReduce style of distributed processing and other efficiency improvements to make it practical to do research on large web datasets 'out of the box' in common computer hardware environments.

The new data infrastructure consists of maintenance and distribution of a newly created billion-page dataset, another new web dataset, and large, anonymized search logs that match the datasets. The combination of large datasets and corresponding large search logs enable a broad community to conduct research with more realistic data resources than were available previously. This research will lead to further advances in the understanding of the underlying issues for large-scale, personalized search, which will be an important part of the next generation of search engines.

For further information, see the project web site at the URL: www.lemurproject.org.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Computer and Network Systems (CNS)
Application #: 0934322
Program Officer: Vasant G. Honavar

Project Start
Project End
Budget Start: 2010-06-01
Budget End: 2014-05-31
Support Year
Fiscal Year: 2009
Total Cost: $530,000
Indirect Cost

CI-ADDO-EN: Collaborative Proposal: Supporting Web-Scale Experimentation using the Lemur Toolkit
Croft, W. Bruce
University of Massachusetts Amherst, Amherst, MA, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments