The Social Web has become an important medium for social interaction and potentially a powerful new computational tool. As people create and share information online, their collective activity shapes the structure and usefulness of the Social Web and can even be used to address a range of problems from collective decision-making to trend prediction. Understanding how the aggregate activity of many interconnected people evolves is crucial to our ability to transform the Social Web into a platform for social computing.

Mathematical modeling is a powerful tool for studying collective human activity. In previous work, the PI and collaborators developed a framework for mathematically modeling emergent behavior of groups of users on the Social Web. This framework allowed the modeler to relate aggregate behavior of a group of users to simple descriptions of their individual behavior. However, it failed to take into account key aspects of the Social Web: user diversity and the extent to which social links indicate a commonality of users' interests.

The goal of this project is to develop a methodology for modeling diverse groups of users on the Social Web and to understand how user heterogeneity affects group behavior. Mathematical modeling and analysis will lead to better, more effective Web sites by identifying productive ways to display information to users, as well as techniques for promoting collaboration and enhancing participation. Analysis will also lead to new insights into how to use human activity for computation, and eventually a programming toolkit for social computing.

Project Report

Using Models of Social Dynamics to Predict Popularity of User-Generated Content: Social media sites such as Digg use crowd-sourcing and social ("follow") links to help people find interesting content. Crowd-sourcing relies on the reactions of the first people to see a new content to indicate whether others will find it interesting. Social links allow fine-tuning the selection by emphasizing reactions by a person's friends. These techniques are especially useful filters for the flood of online content whose quality is hard, if not impossible, to determine automatically. There are, however, challenges for realizing this potential of social media. Are early user reactions typical of later reactions? How do other factors, such as the web site's user interface, the timing of posts, number of friends and deliberate "gaming the system" affect user's reactions? These questions make it difficult to directly relate early user reactions to the story's appeal to the user community. Researchers from University of Southern California and Institute for Molecular Manufacturing working on the SoCS project addressed these challenges with models of user's behavior on Digg. The figure above shows how the popularity of three stories, as measured by the number of votes (diggs), changed over time since each story's submission. The abrupt increase in the slope corresponds to promotion to the front page. Our goal was to understand such curves: what makes some stories more popular than others? How does popularity grow and why does it saturate? What role do social networks play in the evolution of popularity? Can this behavior be predicted? Researchers used a physics-based framework to model users and stories on Digg. A user who sees a story will digg it with probability related to how interesting the story is to that user. The more interesting the story, the more popular it will become. However, digging a story also depends on how easily users can find it. This factor, called visibility, depends on Digg's user interface. The model tracks how visibility changes over time. A story starts in the Upcoming Stories queue, where visibility decreases rapidly as users submit subsequent stories. After promotion to the Front Page, visibility skyrockets and then decreases as additional stories are promoted. Social links also affect visibility: each new digg makes the story visible to that digger's fans. Though the model is simple, the hard part was calibrating it from the available data, i.e., figuring out how often stories are submitted and promoted, how persistently users explore Upcoming and Front page stories, how often users visit Digg, how their activity varies during a day, and so on. For individual stories, SoCS researchers determined interestingness by having the model match, as closely as possible, the observed growth of the number of diggs that story received. Hence, one novel application of our modeling framework is estimating how interesting stories are. The separation of interestingness from visibility provides a different, perhaps truer, measure of a story's quality than its number of diggs. A second application of modeling is prediction. This uses the votes a story receives up to a certain time to estimate its interestingness. SoCS researchers then used the model to predict the subsequent votes the story will receive, both in total and from different groups of users, e.g., to distinguish stories of general interest from those appealing mainly to the submitter's friends. As an example, the figure above shows predictions of story popularity (black line) among three groups of users: the submitter's fans, users who are fans of other diggers but not of the submitter, and users who are not fans of any previous diggers. The x-axis is time, in hours, since submission. The prediction in this example is made at promotion time (vertical dashed line). The model predicts actual votes (dots) fairly well. The model also provides confidence intervals for the predictions, indicated by the shaded areas in the figure, which estimate how well the model predicts. The predictions could be updated as new votes arrive, thereby continually giving both short and long-term forecasts for the story's subsequent votes.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
0968370
Program Officer
Kevin Crowston
Project Start
Project End
Budget Start
2010-08-01
Budget End
2012-07-31
Support Year
Fiscal Year
2009
Total Cost
$258,000
Indirect Cost
Name
University of Southern California
Department
Type
DUNS #
City
Los Angeles
State
CA
Country
United States
Zip Code
90089