Spatial data sets are analyzed in many scientific disciplines, such as ecology, geology, and environmental sciences. However, the classical approaches, such as Kriging and Bayesian hierarchical Gaussian modeling, often break down for large data sets due to expensive matrix inverse operations, whose computational complexity increases in cubic order with the number of spatial locations. To alleviate this difficulty, various approximation approaches, such as covariance tapering, lower-dimensional space spatial process approximation, likelihood approximation and Markov random field approximations, have been proposed under the general idea of approximating the original spatial model with a computationally convenient model. A general concern on these approaches is the adequacy of approximation. In this proposal, the investigators propose three new approaches, Bayesian auxiliary lattice approach, Bayesian site selection approach and marginal inference approach. The Bayesian auxiliary lattice approach introduces an auxiliary lattice to the space of observations and defines a hidden Gaussian Markov random field on the auxiliary lattice. By using some analytical results of Gaussian Markov random fields, the Bayesian auxiliary lattice approach completely avoids the problem of matrix inversion in likelihood evaluation. The Bayesian site selection approach reformulates the problem of spatial model estimation as a problem of Bayesian variable selection. It works with only a small proportion of the data at each iteration and thus significantly reduces the dimension of the data. The marginal inference approach is proposed based on the idea of bootstrap resampling. Like the Bayesian site selection approach, it works with only a small proportion of the data at each iteration and thus significantly reduces the dimension of the data. It is worth noting that the Bayesian site selection and marginal inference approaches are conceptually very different from the approximation approaches existing in the literature. The existing approximation approaches are to approximate the original model using a computationally convenient model. Instead, the Bayesian site selection and marginal inference approaches seek to reduce the dimension of the data, while not sacrificing the complexity of the original model. In this proposal, the investigators also extend the proposed approaches to spatio-temporal models with applications to satellite climate data. How to deal with missing data for spatio-temporal models are addressed.

The intellectual merit of this project is to provide some computationally efficient or data dimension reduction approaches for statistical analysis of large spatial data. The new approaches address some core problems in spatial data analysis, such as large matrix inversion and missing data imputation. The new approaches are expected to play a major role in statistical analysis of geostatistical data, satellite climate data and other large spatial data. This project will have broader impacts in both communities of spatial statistics and computational atmospheric sciences. The research results will be disseminated to the communities via direct collaboration with researchers in other disciplines, conference presentations, books, and papers to be published in academic journals. The project will have also significant impacts on education through direct involvement of graduate students in the project and incorporation of results into undergraduate and graduate courses.

National Science Foundation (NSF)
Division of Mathematical Sciences (DMS)
Standard Grant (Standard)
Application #
Program Officer
Gabor J. Szekely
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Texas A&M Research Foundation
College Station
United States
Zip Code