Graphs provide a natural representation of real-world networks e.g. world-wide web, biological networks, social networks. There is a growing body of work on both models of network structure and algorithms to automatically discover patterns (e.g., communities) in the structure. However, statistical methods for assessing the significance of discovered patterns or distinguishing between alternative models have received less attention.

Robust statistical models, which can accurately represent distributions over collections of graphs, are critical for principled quantitative investigation of networks and their properties. Specifically, since sampling distributions (either analytical or empirical) can be used to determine the likelihood of a given sample, statistical models facilitate hypothesis testing and anomaly detection (e.g., graphs with low likelihood can be flagged as anomalous). However, unlike metric spaces the space of graphs exhibits a combinatorial structure which poses significant theoretical and practical challenges which need to be overcome for accurate estimation and efficient inference.

This project investigates the interplay between choice of model representation, parameter estimation, and sampling/inference to develop statistical models that can accurately estimate (parametric) probability distributions over the space of graphs (i.e., graph populations). Speicifically, the project focuses on a novel probabilistic graph model that generates graphs by quilting together a set of subgraphs sampled from simpler basis graph models and the application of the resulting model to (i) explore and define graph classes, (ii) detect anomalies and assess their significance, and (iii) investigate graph dynamics and formally characterize notions of temporal stationarity and dynamic evolution.

The proposed work will advance the state of the art in probabilistic models, and rigorous statistical methods for analyis, of graph structured data. The proposed work also contributes to research-based training of graduate and undergraduate students at Purdue University. All of the software, publications, and data resulting from the project will be freely disseminated to the larger research and educational community.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
1219015
Program Officer
Sylvia Spengler
Project Start
Project End
Budget Start
2012-09-01
Budget End
2016-08-31
Support Year
Fiscal Year
2012
Total Cost
$491,841
Indirect Cost
Name
Purdue University
Department
Type
DUNS #
City
West Lafayette
State
IN
Country
United States
Zip Code
47907