The research team, in this collaborative proposal, aims to produce regional decadal-scale climate forecasts from paleo proxy data. Generally, the researchers will improve existing analytical methods and develop new statistical methods for the reconstruction of past climates from heterogeneous geological proxies (tree rings, ice cores, speleothems, corals, sediments), with particular emphasis on regional variability over the past two millennia.

Specifically, the research team will use the regularized expectation-maximization algorithm (RegEM) algorithm to exploit linear covariation in space among different climate variables or proxies to impute missing values and estimate climate statistics. The goal is to directly address uncertainties in data, especially surrounding missing data within a series. This is important for paleoclimate data sets since the spatial or temporal series may not be continuous.

The broader impacts involve the training of students across disciplinary boundaries in a timely integration of statistics and climatology.

Project Report

Our ability to produce reliable decadal-scale climate forecasts at regional scales depends on our ability to validate climate models at those scales, which is limited by the shortness and sparsity of the instrumental record. Circumventing this limitation requires extending climate records beyond the instrumental period, which can be done by exploiting climate-sensitive proxies (e.g., tree rings, isotopic composition of corals) to statistically reconstruct variables and indices describing regional variability. Doing so requires methods to estimate the missing climate data (e.g., temperatures) from the available proxies (e.g., tree rings). Thus, reconstructing past climates is an incomplete-data problem---one that is particularly challenging because of the sparsity of available proxies and the high dimensionality of the data one wishes to reconstruct (e.g., temperatures on global grid with thousands of points). The goal of this project is to use methods from modern statistics and applied mathematics to improve methods for reconstructing past climates from proxies. We took as the starting point the regularized expectation-maximization (RegEM) algorithm, which was developed by the PI previously and now is widely used for paleo-climate reconstructions. However, as currently in use, the RegEM algorithm is not ideal because for paleo-climate problems with limited spatial information about climate changes, it can overdamp (excessively smooth) spatial variations (e.g., in temperatures). A variant of the algorithm, also developed by the PI, uses a regression method called truncated total least squares (TTLS), which can alleviate this problem, but as it is currently used, it is not able to adapt the level of spatial smoothing it imposes to the structure of the proxy data and their signal-to-noise level. A central aim for us was to improve upon this situation by developing a version of the RegEM algorithm with an objective and data-adaptive choice of smoothing. We have succeeded in doing so and now have a RegEM algorithm with a data-adaptive TTLS regression, in which the smoothing is chosen by a procedure known as K-fold cross-validation. In essence, the procedure successively leaves out portions of the data in the reconstruction, uses the left-out portion to assess the goodness of the reconstruction (as measured by a variety of possible metrics), and repeats this to determine an optimal smoothing that gives the "best" reconstruction (as measured by the chosen metric). We have tested this method with synthetic data and with "pseudo-proxies"--degraded output from climate models that mimics some of the statistical properties of actual climate proxies. So far, the tests are very promising: with the synthetic data, we consistently and reliably obtain much improved reconstructions; tests with the pseudo-proxies are ongoing but also point to a, possibly substantial, improvement. The next step will be to use this improved algorithm with actual climate proxies to obtain a new reconstruction, e.g., of temperatures over the past two millennia.

Agency
National Science Foundation (NSF)
Institute
Division of Atmospheric and Geospace Sciences (AGS)
Type
Standard Grant (Standard)
Application #
1003614
Program Officer
David J. Verardo
Project Start
Project End
Budget Start
2010-06-15
Budget End
2014-05-31
Support Year
Fiscal Year
2010
Total Cost
$89,058
Indirect Cost
Name
California Institute of Technology
Department
Type
DUNS #
City
Pasadena
State
CA
Country
United States
Zip Code
91125