The recent years have seen rapid growth in the depth, richness, and scope of scientific data, a trend that is likely to accelerate. At the same time, simulation and analytical models have sharpened to unprecedented detail the understanding of the processes that generate these data. But what has advanced more slowly is the methodology to efficiently combine the information from rich, massive data sets with the detailed, and often nonlinear, constraints of theory and simulations. This project will bridge that gap. The investigators develop, implement, and disseminate new statistical methods that can fully exploit the available data by adhering to the constraints imposed by current theoretical understanding. The central idea in the work is constructing sparse, possibly nonlinear, representations of both the data and the distributions for the data predicted by theory. These representations can then be transformed onto a common space to allow sharp inferences that respect the inherent geometry of the model. The methodology developed in this project will apply to a wide range of scientific problems. The investigators focus, however, on a critical challenge in astronomy: using observations of Type Ia supernovae to improve constraints on cosmological theories explaining the nature of dark energy, a significant, yet little- understood, component of the Universe.

Crucial scientific fields have enjoyed huge advances in the ability both to gather high-quality data and to understand the physical systems that generated these data. Nevertheless, the full societal and scientific value of this progress will only be realized with new, advanced statistical methods of analyzing the massive amounts of available data. The investigators develop statistical methods for combining theoretical modelling and observational evidence into improved understanding of these physical processes. The analysis of these data will requirenot only new methods, but also the use of high-performance computing resources. There is a particular need for these tools in cosmology and astronomy, and this project will bring together statisticians and astronomers to combine expertise, but this research is motivated by problems that are present in other fields, such as the climate sciences.

Project Report

The major goals of this work were the development of sophisticated, rigorous methods of statistical inference that take full advantage of both the rich data that arise from astronomical surveys, and the deep understanding of the physical processes that generate these data. For example, how does one work with a data set whose individual components are images of galaxies? And how does one perform statistical inference in cases where the model that relates the parameters of interest to the observable data is very complex? To answer these questions, we worked, and continue to work, to develop and implement statistical methods that can deal with this complexity. For example, part of the results have been to find ways of summarizing the information in a galaxy image so that useful information is preserved. The initial motivation for these representations was to identify galaxies of unusual nature; this is an important problem as we face massive collections of such images and must develop automated ways of searching the collection for interesting cases. This work is currently being extended to use these low-dimensional representation for other inference goals. As another example, we are working on developing methods for estimating key physical constants in cases where our understanding of the relationship between those constants and the observable data is, at least in part, only possible to simulate. This is a problem of increasing importance. As the quality of simulation models increases, we need to build methods of statistical analysis that can take advantage of these tools. Existing techniques largely require that models can be written as a system of equations. Our results from this work have extended methods used in genetics to make them feasible in astronomy and cosmology. The initial work has led to multiple further projects in this area.

National Science Foundation (NSF)
Division of Mathematical Sciences (DMS)
Standard Grant (Standard)
Application #
Program Officer
Gabor J. Szekely
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Carnegie-Mellon University
United States
Zip Code