With funding from the National Science Foundation, The Cornell project will create a research laboratory for social science research based on a largely untapped dataset: the Internet Archive's 40-billion page collection of Web pages. These snapshots of the Web have been captured and archived every two months for nearly ten years. The project will make very large portions of this massive collection widely accessible for social science research. The flood of available on-line information - from corporate web pages to news groups and blogs - has the potential to open up new frontiers in social science research. The Cornell team plans to build an intelligent front-end that will make the Internet Archive broadly accessible to social scientists, and to develop, test, and refine these tools through a specific research application - the diffusion of innovation.

The development of such tools requires the application of cutting edge research in natural language processing and machine learning algorithms. The project team also includes computer scientists with expertise in the privacy-preserving analysis of data -- a basic challenge in making on-line data more readily accessible for research and policy applications in the social sciences.

The importance of the Web Archive extends beyond pure research to practical applications for business and government. These tools can be used to identify market trends, the rise and fall of demand, and the spread of consumer opinion. Community watchdog groups will be able to track the spread of "hate sites" and government agencies will be able to trace past and current uses of the Web for organizing and coordinating terrorist attacks.

Agency
National Science Foundation (NSF)
Institute
Division of Behavioral and Cognitive Sciences (BCS)
Application #
0537606
Program Officer
Patricia White
Project Start
Project End
Budget Start
2006-01-01
Budget End
2010-06-30
Support Year
Fiscal Year
2005
Total Cost
$1,999,990
Indirect Cost
Name
Cornell University
Department
Type
DUNS #
City
Ithaca
State
NY
Country
United States
Zip Code
14850