The project will initiate a systematic study of statistical models for network data and other complex data structures which commonly arise from interacting and self-organizing processes such as protein folding, gene expression, neural functioning, economic activity, and social behavior. A focus of the project is to address novel challenges that cannot be handled by common approaches to statistical modeling and inference, which often rely on assumptions of (i) statistical regularity, (ii) short range dynamics driven by forces exogenous to the data, and (iii) representability of the data as the aggregation of isolated measurements taken on a representative sample of units. These assumptions are often violated in complex data problems, which are characterized by (i) high levels of interaction among different components of the data, (ii) dynamical behaviors driven by endogenous feedback mechanisms, and (iii) partial or complete irreducibility of the data structure. The project will produce new methodologies, theoretical results, and conceptual insights for statistical inference in these settings. Beyond substantive technical contributions, which will have an impact across scientific domains, research from the project will be widely disseminated to the general public through the PI's participation in forums for communicating probability and statistics to interdisciplinary audiences. The PI trains graduate students and runs a weekly seminar on the Foundations of Probability and Statistics, with videos uploaded for open public access at his website. In addition, the PI advocates strongly for peer review reform and open source publication, and will publish all work from this project for public peer review on Researchers.One, a non-profit publishing outlet aimed at increasing the quality and accessibility of peer review across research disciplines.
To achieve these aims the project will develop rigorous theory and robust statistical methods for analyzing dynamic and complex network data structures. Desired outcomes include new theory, models, methods, and concepts for network analysis, a deeper understanding of the scope and limitations of statistical tools for modern network analysis, and a general framework for modeling network data that arises across scientific disciplines. Model development lies at the core of the project, with a focus on extending recently proposed model classes of edge and relationally exchangeable network models, rewiring models, and graph-valued Levy process models to a flexible statistical framework for latent space relational models and network-valued autoregressive and state space models. From these models, the project will produce a theoretical framework as well as a range of methodological tools for future developments in statistical network analysis. The project will draw on concepts and techniques from a wide range of topics including Bayesian nonparametrics, spatial statistics, time series, probability theory, stochastic processes, and computing, as well as mathematical concepts from graph theory, combinatorics, and algebra. The research will, therefore, contribute substantially to disciplines across the mathematical sciences, where network and complex data analysis have become increasingly relevant for scientific research in proteomics, genomics, economics, social science, finance, biology, computer science, and physics as well as methodologically driven disciplines within statistics and related fields, such as data science, artificial intelligence, and machine learning.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.