Scientific data increasingly takes the form of networks, but the ability to collect and process graph data has out-paced our ability to analyze them statistically. Most of the critical scientific questions about networks revolve around comparisons between networks (e.g., over time or across experimental conditions). Such problems arise in fields as different as neuroscience, epidemiology, economics, climatology and criminology. Currently, network analysts compare only basic descriptive statistics (e.g., the average distance between nodes in the graph), ignoring issues of global structure and statistical validity. We will develop a rigorous statistical theory of network comparisons. Our approach rests on recent develops in network theory which show how large graphs approximate continuous geometric objects, so that tools for geometric comparisons can be applied to networks.
Our project will develop rigorous statistical methods and efficient algorithms for network comparisons. The first step is the flexible non-parametric estimation of continuous network models, where we will pursue three complementary strategies, using regression smoothing, density estimation in non-Euclidean latent spaces, and ensembles of trees. Having represented networks as continuous stochastic processes, we will develop statistical theory and methods for detecting and characterizing differences between such processes. Interdisciplinary proof-of-concept applications, including those in public health (through online social networks), finance (through financial networks), neuroscience (through brain connectivity networks), genetics (through gene regulatory networks), and proteinomics (through protein interaction networks), will demonstrate the power of the geometric approach in comparing large and disparate sample data.