Global weather and climate models represent the atmosphere on computational grids with horizontal spacing of perhaps 100km, stacked in layers which can be over a kilometer thick. Such grids suffice to capture the dynamics of cyclones, fronts, and other large-scale atmospheric phenomena, but these phenomena depend critically on processes with spatial scales much smaller than the grid spacing. The small-scale processes must be represented indirectly, through parameterization schemes which estimate their net impact on the resolved atmospheric state. For example clouds are typically too small for the grid spacing yet they are critical for moving moisture from the ocean surface to the mid-troposphere, thus cloud parameterizations play a key role in determining atmospheric humidity even on the largest spatial scales. Parameterization schemes are inherently approximate, and the development of schemes which produce realistic simulations is a central challenge of model development. Shortcomings in parameterization limit the usefulness of weather and climate models both for scientific research and for societal applications.

Most parameterization schemes depend critically on various parameters whose values cannot be determined a priori but must instead be found through trial and error. This task, referred to as "tuning", is laborious as it is performed separately for each parameterization scheme and involves multiple integrations of the model in multiple configurations. It is also inefficient in its use of observations, which is unfortunate given the large amount of observational data available from satellites and other sources. The resulting parameter sets may not be optimal and may produce unexpected results when all the schemes interact with each other in global simulations. Finally, manual tuning is not conducive to uncertainty quantification, which would be valuable for estimating the uncertainty in future climate change projections.

The goal of this project is to replace ad hoc manual tuning with a combination of data assimilation, machine learning, and fine-scale process modeling using large eddy simulation (LES) models. LES models have grid spacings of a few tens of meters and can explicitly simulate the clouds and turbulence represented by parameterization schemes. These ingredients are combined to create a global Machine Learning Atmospheric Model (MLAM), in which LES models embedded in selected grid columns of a global model explicitly simulate subgrid-scale processes which are represented by parameterization schemes in the other columns. Machine learning is used to tune the schemes to emulate the behavior of the LES simulations, so that explicit simulations become an online benchmark for parameterization. In this way all the schemes can be tuned together and interactively within a running global simulation. Observational data from a variety of sources is assimilated during the model integration to provide a further constraint on parameter values, and estimates of parameter uncertainty are generated as part of the automated tuning. A similar tuning process is implemented in an ocean general circulation model, and the two are combined to produce a machine learning climate model. Model tuning is generally viewed as a necessary but mundane activity which is not in itself a research topic. But a model capable of learning its parameters from observations and process models offers a new path forward, toward both better models and better ways of using models.

The work has broader impacts due to the societal value of better forecasts and projections from weather and climate models. The work directly addresses uncertainty in forecasts and projections used by decision makers to plan for weather and climate impacts. In addition, the modeling strategy developed here is applicable to a broad class of research areas which face the problem of relating large-scale behaviors to small-scale unresolved processes (the problem of relating genotypes to phenotypes in evolutionary biology, for example). In addition, the PIs will establish a cross-disciplinary graduate program on data-driven Earth system modeling. The program bridges the gap between environmental and computational sciences which currently hinders progress in environmental modeling.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Atmospheric and Geospace Sciences (AGS)
Application #
1835860
Program Officer
Eric DeWeaver
Project Start
Project End
Budget Start
2018-11-01
Budget End
2023-10-31
Support Year
Fiscal Year
2018
Total Cost
$2,499,842
Indirect Cost
Name
California Institute of Technology
Department
Type
DUNS #
City
Pasadena
State
CA
Country
United States
Zip Code
91125