The project augments our existing WebBase facility, by adding computation and storage servers. WebBase (http://dbpubs.stanford.edu:8091/~testbed/doc2/WebBase/) is a facility that crawls targeted portions of the Web at regular intervals. The system thereby creates unique time series of topic or domain focused snapshots.
Existing snapshots include quarterly collections of all government Web sites, collected over several years, a number of 2005 daily crawls over 350 sites that were relevant to hurricane Katrina, and daily crawls of sites related to several California elections. The purpose of these collections is to enable computing research that will make large Web archives accessible to historic and other analysis. The system is enabling web research, at Stanford and elsewhere, on topics such as filtering, searching and tagging of web resources. The system also enables research by social scientists who are studying social, political and cultural trends.