SGER: Exploratory Research: Using the Cuberinfrastructure to build a Full Text Index to the Web

Arms, William

Abstract

This project will attempt to build a full text index to the textual web pages in the historical collections of the Internet Archive. The Internet Archive has taken a snapshot of the web every two months since 1996 and stored it. It now comprises approximately 40 billion web pages, consuming multiple petabytes of storage. The resulting index of the project may be the largest and best organized inverted index ever created that is freely available to academic researchers. It will enable social and information scientists to explore altogether new dimensions of contemporary events and practices, while offering information scientists a vital large-scale testing resource in areas such as advanced information retrieval on semistructured collections.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Information and Intelligent Systems (IIS)
Type: Standard Grant (Standard)
Application #: 0634677
Program Officer: Stephen Griffin

Project Start
Project End
Budget Start: 2006-10-01
Budget End: 2007-09-30
Support Year
Fiscal Year: 2006
Total Cost: $120,000
Indirect Cost

SGER: Exploratory Research: Using the Cuberinfrastructure to build a Full Text Index to the Web
Arms, William
Cornell University, Ithaca, NY, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments