This proposal focuses on the development of a new approach to tackle the challenges in statistical modeling and analysis of massive spatial data sets. Due to the complexity and enormousness of the data, conventional statistical methods for analyzing, modeling and making inference of large data become indispensable in current research and application of environmental, earth and biological sciences. Two approaches have been recently proposed but have their own drawbacks. One approach, based on low rank approximation of covariance functions, works well to model large scale spatial variability but may fail to adequately capture small scale behavior. The other approach, based on sparse matrix approximation, appears to work better when the spatial data have only relatively small scale dependence. The investigators propose a new approach that combines these two approaches to provide a high quality approximation to the covariance function at both the large and small spatial scales. Specific research projects will include parameter estimation of various geostatistics models, data imputations for missing satellite measurements, spatial-temporal modeling for detection and prediction of global climate change, multivariate spatial models for multivariate satellite measurements, and non Gaussian spatial models for characterization of extreme environmental events.

With rapid advances of science and technology, large amounts of spatial data are generated from various sources including remote ground sensors, satellite images, scientific climate computer models, Geographic Information Systems, and public health and spatial genetics. The proposed methods will make it possible to analyze, model and make inferences about massive spatial data sets and benefit researchers and practitioners in environmental, earth and biological sciences. Research results will be disseminated through collaborative work, academic presentations, and journal publications. Web pages will be created to enable quick access to user-friendly and accessible software implementations of new methods as well as technical reports and relevant references.

Project Report

With rapid advances of science and technol-ogy, large amount of spatial data are generated from various sources. Due to the complexity an enormousness of the data, conventional spatial statistical methods often face critical computational challenges. We have developed new theories, methodologies and computational tools for the analysis of spatial problems. Specifically, we have proposed a noval covariance approximation method, referred to as the full-scale approximation (FSA), to facilitatethe computations for Gaussian Processes (GP). This approach combines merits of reduced rank and sparse matrix algorithms to facilitate the computations of maximumlikelihood estimation, spatial prediction and Bayesian inference. We have also successfully extended this approach in various settings of spatial models, including spatial temporal, multivariate, non-stationary and non-Gaussian geostatistical data sets. The developed statistical methods also have a big impact on other scientific fields. We have successfully applied the developed methods to a wide range of real problems, including satallite ozone data, precipitation data from multiple sources, temperature and wind data, as well as fMRI data. We have transformed our research results into journal articles and manuscripts, academic presentations and open computer programs. We have published twenty two journal articles in leading statistical journals such as JASA, Annals of Statistics, JRSS-B, Biometrika and Journal of Maching Learning Research. We have given numerous invited presentations to disseminate our findings and results. We have developed Matlab Toolboxes for the computation of massive nonstationary spatial datasets are published as a supplementary file in the Journal of Computational and Graphical Statistics.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
1007618
Program Officer
Gabor Szekely
Project Start
Project End
Budget Start
2010-07-15
Budget End
2014-06-30
Support Year
Fiscal Year
2010
Total Cost
$179,660
Indirect Cost
Name
Texas A&M Research Foundation
Department
Type
DUNS #
City
College Station
State
TX
Country
United States
Zip Code
77845