The Gaussian process is a mathematical tool that can use incomplete data to fill in gaps, for example to interpolate the temperature at a person?s house given a network of nearby weather stations. Gaussian processes are used in many application areas, such as geospatial analysis, machine learning, and the analysis of computer experiments. Gaussian processes are flexible, interpretable, and provide natural quantification of uncertainty. However, direct application of Gaussian processes is too computationally expensive for large datasets. This project addresses the computational challenges with novel algorithms and bridges the gap between statistical and machine learning approaches. As big data now appear in almost every field of science and society, providing powerful, scalable, and free software to analyze such datasets can have a transformative effect. This work will replace current practices and approximations for massive spatial data that are often simplistic due to computational limitations. This project can lead to improved accuracy and uncertainty quantification in countless applications with direct impact on society, including carbon monitoring, renewable energy, rainfall prediction, calibration of robotic arms, and modeling and prediction of insurgent activities. The developed methods and software will thus be an important tool for computational and data-enabled science and engineering. The investigators will mentor and train student researchers, and share the project findings via journal publications and conference presentations.

The goal of this project is to develop a nearly universal toolbox for scalable Gaussian process (GP) modeling. The toolbox is based on the ordered conditional approximation (OCA), a simple but very powerful idea that exploits the screening effect (i.e., conditional independence) exhibited by many popular covariance functions. The OCA framework unifies many state-of-the-art GP approximations from statistics, machine learning, and numerical linear algebra. This project will result in new, highly accurate OCA methods with guaranteed scalability and broad applicability for modeling and analysis of nonstationary, multivariate, multi-scale, and other processes. Also, extensions will be developed that allow these new spatial-statistics methods to be used in a variety of machine-learning applications, where OCA-type approaches have not received much attention so far. For the new methods, the computational cost is guaranteed to be linear in the data size, with further speed-ups possible through parallelization. All approaches will be implemented in easy-to-use open-source software. This will allow users to bring the power of GPs to bear on modern datasets, enabling spatial prediction, calibration, parameter learning, and nonparametric regression with big data.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

National Science Foundation (NSF)
Division of Mathematical Sciences (DMS)
Standard Grant (Standard)
Application #
Program Officer
Yong Zeng
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Cornell University
United States
Zip Code