The project will develop new statistical models for network growth and change, and apply these to study the evolution of the Wikipedia. The research builds on latent factor models for social networks and recent advances in variable selection and cluster analysis in high dimensions. Using information on the text in Wikipedia entries and its current connectivity structure, the research will estimate where new entries will appear and characterize the local graph structures in different regions of the hyperlinked data set. Although the models are tuned to the Wikipedia, the methodology has general relevance to the study of complex networks.

The Wikipedia is a unique mirror of human knowledge It has grown quickly, and this growth continues. From the standpoint of understanding how humans organize information, it is important to identify the "holes" in the Wikipedia, where new entries will arise. Similarly, one wants to know whether information on, say, Henry VIII is organized in the same way as information on Homotopy Theory. Both kinds of questions can be analyzed statistically, using publicly available version control data that has been archived to help discover Wikipedia vandalism. The research has direct impact on the study of the structure of human knowledge, and indirect impact on the study of change in complex networks.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Type
Standard Grant (Standard)
Application #
0907009
Program Officer
Gabor J. Szekely
Project Start
Project End
Budget Start
2009-09-01
Budget End
2011-08-31
Support Year
Fiscal Year
2009
Total Cost
$59,485
Indirect Cost
Name
Harvard University
Department
Type
DUNS #
City
Cambridge
State
MA
Country
United States
Zip Code
02138