Researchers in forestry, ecology, climate sciences, environmental health, and many other fields routinely analyze geo-tagged data collected at thousands of locations using spatial statistics. Modern Geographical Information Systems (GIS) are empowered to simultaneously measure many different variables at each location. This heralds a shift towards a multivariate paradigm in spatial statistics. A joint analysis of all the variables help identify common geographical patterns and sources for the different variables. In this project, the PI pursues statistical methodology that can adequately address the emerging complexities of such highly-multivariate geospatial datasets. The innovations include a) utilizing available scientific information about the dependence among variables, b) ensuring computational scalability of the algorithms, and c) improving interpretability of findings from the multivariate analysis. The genesis of the proposed innovations lies in substantive questions related to climate modeling, air and water quality. These research domains study some of the most threatening challenges to the human society in the twenty-first century. The statistical methods developed in this project will enable practitioners in these fields to conduct highly-multivariate spatial analysis using modest computing resources. The project also provides the opportunity to train graduate students in many diverse and essential areas of statistics as well as in advanced statistical computing.
Gaussian Processes (GPs) have long been used for modeling multivariate spatial surfaces. Multivariate GPs are often created by mixing univariate ones which obfuscate the individual spatial characteristics of each resultant surface. Direct constructions like the multivariate Matern GP are more interpretable but entail complex parameter constraints offering little flexibility to exploit prior information about inter-variable dependence. The PI proposes a novel procedure to create multivariate GPs that endows each surface with an interpretable GP measure with surface-specific variance, smoothness and spatial decay, but also enables incorporating the dependency network among the variables into the construction. A recurrent theme throughout is the versatile exploitation of graphical models. Graphs defined in space, time and variable domains are used to create multivariate GPs with desirable properties in terms of interpretation, computation and structure. Another accompanying theme is utilizing a standard decomposition of GPs to extend the discrete construction to well-defined continuous stochastic processes, thereby enabling predictions at any new location. Novel, simple, but efficient strategies will be explored for parameter estimation. Finally, the PI separately focuses on non-Euclidean spatial domains like estuaries and river networks. New univariate GPs will be devised that respect the complicated contours of these domains. Subsequently, harmonious application of graphical models will create multivariate locally smooth GPs to analyze multivariate spatial data on such domains.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.