This project preserves and provides access to the image data associated with the ClueWeb09 dataset. The ClueWeb09 dataset was created by the PI in early 2009 to support research on the web, information retrieval, and related human language technologies. The dataset contains the texts of about 1 billion of the most important web pages written in ten major languages. The dataset was adopted quickly by the research community: 55 copies were licensed in the first 3 months of availability, and it was used in 4 out of 7 tracks of the National Institute of Standards and Technology's (NIST's) 2009 TREC evaluation of information retrieval research.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
0948856
Program Officer
Maria Zemankova
Project Start
Project End
Budget Start
2009-09-15
Budget End
2010-08-31
Support Year
Fiscal Year
2009
Total Cost
$25,589
Indirect Cost
Name
Carnegie-Mellon University
Department
Type
DUNS #
City
Pittsburgh
State
PA
Country
United States
Zip Code
15213