Real-time information is a fundamental emerging issue in the creation and management of Web content. Increasingly, rather than consulting relatively static sources that are indexed on a periodic basis, people refer to information on news sites, blogs, social-networking sites, and Twitter feeds that change dynamically and spread rapidly. This project will study how information content varies over time, how it is transmitted through underlying social networks, and how its recipients assemble it into larger units. The project will explore new techniques for addressing these issues, based on novel methods for tracking, analyzing, and presenting information that evolves and spreads rapidly over time. The resulting approach aims to transform important aspects of the ways in which real-time information on the Web is handled.
First the fundamental units of information that spread through the underlying information networks will be identified. From a set of nearly 1 billion news media articles and blog posts (approx. 6TB of data), and a collection of 500 million tweets from Twitter, small (generally textual) units of information will be identified that remain relatively stable as they spread through the Web. The temporal variation within these basic units will be analyzed and modeled. This modeling will include connections with biological models of epidemics, as well as new frameworks that exploit the fundamental differences between biological and social contagion. Finally, the temporal variation will be related to network-level models for the diffusion of this information. Generally, the actual networks on which real-time information spreads cannot be directly observed, nor can the influence of any particular node in the network be directly measured. Therefore, the project will develop machine- learning techniques that infer these hidden networks and unobserved levels of influence.
For more information see the project web site at: http://snap.stanford.edu/proj/mipro