In modern data science, networks have emerged as one of the most important and ubiquitous types of non-traditional data. Recently, data sets with a large number of independent network-valued samples have become increasingly available. In such data sets, a network serves as the basic data object, and they are commonly seen in neuroscience, genetic studies, microbiome studies, and social cognitive studies. Such types of data bring statistical challenges that cannot be adequately addressed by existing tools. This project seeks to provide foundational perspectives on the emerging inferential and computational challenges in modeling a population and populations of networks. The theory and methods developed here will allow us to characterize the network connectivity at the population-level, and to monitor how the subject-level connectivity changes as a function of subject characteristics. Quantifying such subject-level differences has become central in studying the human brain, genetics, and medicine in general. Motivated by applications in neuroscience, this research will be beneficial for a variety of fields that study brain development, aging, and disease diagnosis, progression and treatment. Integration of research and education will be achieved through training undergraduate and graduate students, and developing special topics graduate courses.
This project aims to develop a new network response model framework, in which the networks are treated as responses and the network-level covariates as predictors. The framework developed in this project, under appropriate structural constraints, will preserve the intrinsic characteristics of networks, ensure model identifiability, facilitate scalable computation, and allow valid statistical inference. A variety of fundamental and critical computational and inferential challenges will be addressed under this framework, including model identifiability, efficient computation, quantifying computational and statistical errors, and debiased inference. Additionally, the investigator will develop two novel goodness-of-fit tests for a broad class of network models, including those considered in this project. Further, the investigator will investigate modeling with heterogeneity by developing a network mixed-effect model, and a framework for model-based network clustering. Developments in both directions are formulated to take into account the rich information from subject covariates. The theory to be developed under asymptotic regimes allows the network size, the number of network samples, and the model complexity (e.g., rank, sparsity, number of clusters) to increase at reasonable rates.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.