III-COR-Medium: Efficient and Effective Search Services Over Archival Webs

Davison, Brian; Suel, Torsten

Abstract

The Web is enormous and in constant flux, causing much content to be lost over time. Historical collections of web content are thus of monumental value in preserving records of significant aspects of modern society. The Internet Archive offers access to hundreds of billions of historical web page snapshots. The scale of such archives, however, presents tremendous challenges to making this content fully searchable. This research effort investigates efficient and effective approaches to store, index, and retrieve web content from large-scale historical archives. In addition, the temporal content and structure of the archives are mined to exploit temporal characteristics that can improve search result ranking. Technological advances from this work are being tested on content from and in collaboration with the Internet Archive and integrated into its infrastructure, enabling new archival search capabilities for the public.

www.cse.lehigh.edu/~brian/nsf/archives-08.html

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Information and Intelligent Systems (IIS)
Type: Standard Grant (Standard)
Application #: 0803605
Program Officer: Maria Zemankova

Project Start
Project End
Budget Start: 2008-09-01
Budget End: 2012-08-31
Support Year
Fiscal Year: 2008
Total Cost: $900,000
Indirect Cost

III-COR-Medium: Efficient and Effective Search Services Over Archival Webs
Davison, Brian Suel, Torsten
Lehigh University, Bethlehem, PA, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments