Many applications generate data that can be modeled as graphs: Biological networks, social networks, ecological networks and food-webs, among others. Traditional graph theory and most current research in graph modeling, querying, and mining concentrates on problems where the graph structure is inherently static and does not change with time. But networks in the real world are dynamic in nature with a wide range of temporal changes. While the topology of networks such as social networks and transportation networks undergoes gradual change (or evolution), the content (information flow, annotations) changes more rapidly.

Against this background, this project aims to develop a set of scalable querying and mining tools for dynamic graphs by integrating techniques from databases, data mining, and algorithms. The first research thrust examines inherent properties for characterizing dynamic graphs, specifically the dynamic reachability structure of nodes. It also investigates high-fidelity methods for generating dynamic graphs based on these properties. The second research thrust aims to develop summarization techniques for dynamic graph structures. These techniques can be used to compress large graph datasets, to make predictions about future values, and to query information cascades under partial observation. The third research thrust aims to develop techniques for mining significant dynamic subgraphs under different constraints of connectivity such as fixed subgraph structure, connected subgraphs, and smooth subgraphs. The goal is here to find anomalous patterns in dynamic graph datasets using a statistical characterization of background behavior. The final research thrust reconsiders the first three research thrusts from the point of view of content and topic models in order to understand the relationship between content of a message and its flow in a network. The developed methods will be evaluated using a number of real-world data sets including email datasets such as Enron, re-tweeting activity data sets on Twitter, Facebook wall posts, and transportation networks.

An important result of this work is a theoretically well-founded and empirically verifiable framework for modeling, querying and mining of dynamic graphs. Aspects of dynamic behavior in which both the structure of networks and their content (information flow, annotations, etc.) change will be considered. The study of such dynamic networks and how information flows through them is essential to developing a theory of dynamic networks and their evolution. This work helps answer questions such as power-law applies to dynamic behavior, whether content of a message can predict its flow and vice versa, whether anomalies in a dynamic network can be mined effectively by building either an empirical summary or a generative model. Robust open source tools based on the developed algorithms will be released for research, academic and non-profit endeavors. The research is expected to yield new techniques in graph algorithms, graph databases, and graph mining, and realize a collection of tools that can be used by scientists, and ultimately lead to a theory for dynamic graphs.

Broader Impacts: The proposed project will integrate research and education by introducing the results of the project into a graduate seminar, and a graduate course on information management. The project will support a postdoctoral researcher and train graduate students. The project offers enhanced opportunities for research-based training of graduate and undergraduate students, including members of under-represented groups e.g., females in Computer Science at the University of California at Santa Barbara. For high school students, the CNSI Apprentice Research Program at UCSB brings in high school students every summer. The open source implementations of algorithms resulting from this work will be freely disseminated to the community.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
1219254
Program Officer
Sylvia Spengler
Project Start
Project End
Budget Start
2012-09-01
Budget End
2016-08-31
Support Year
Fiscal Year
2012
Total Cost
$514,318
Indirect Cost
Name
University of California Santa Barbara
Department
Type
DUNS #
City
Santa Barbara
State
CA
Country
United States
Zip Code
93106