Massive new data resources are coming online in every area of human exploration, changing the way researchers approach data analysis. All-purpose algorithms are expected to identify patterns in data with little tailoring to the problem at hand. While this all-purpose approach is understandable given the massive computational challenges of "big data," it often sacrifices our ability to understand underlying scientific processes. On the other hand, explicitly embedding scientific models into massive statistical analyses may pose significant computational challenges. This project navigates the tension between such all-purpose or "data-driven" methods and specially-designed or "science-driven" methods using a suite of data analytic projects in astro- and solar physics. This work is motivated by recent advances in space-based instrumentation that are increasing both the quality and the quantity of data available to astronomers. Several projects focus on developing new methods for detecting and characterizing astronomical sources, combining information in observations made across the electromagnetic spectrum, including high resolution spectrography, imaging, and time series. Other projects investigate methods for extracting useful features from ultra-high-resolution images of the Sun with the ultimate aim of predicting explosive dynamic processes in the solar atmosphere. The CHASC International Center for Astrostatistics has a track record of designing methods that leverage efficient data-driven techniques but still incorporate scientific understanding of the astronomical sources and maintain the ability to answer specific scientific questions about the underlying astronomical and physical processes. The CHASC Center not only aims to develop new methods for astronomy but also plans to use these problems as springboards in the development of new general statistical methods, especially in signal processing, image analysis, multilevel modeling, and computational statistics.

The CHASC International Center for Astrostatistics plans to tackle these challenges using principled statistical methods that incorporate both data-driven and science-driven approaches. For example, the investigators will use coarse data-driven models in an initial analysis that aims to identify simple structures that can be used in a more scientifically meaningful secondary analysis. To formally test for unexpected features in astrophysical images, the team will use flexible data-driven models for deviations from science-driven models for known features. As these examples illustrate, modern astrostatistical analyses involve subtle tradeoffs between complexity and practicality and pose significant computational challenges. A primary aim of this project is to produce tailored Monte Carlo methods that are efficient in such complex settings. The team's statisticians (Meng, van Dyk, Lee, and Stein) have substantial research experience in developing the methods that the Center will extend, employ, and publicize to tackle these challenges: inferential and efficient computational methods under highly-structured models that involve multi-scale structure and/or multiple levels of latent variables and incomplete data. Such models are ideally suited to account for the many physical and instrumental filters of the data generation mechanisms in astrophysics. The astronomers (Kashyap and Siemiginowska) have expertise in the instrumentation and science of high-energy and optical astronomy, and have collaborated with statisticians in developing methods to address scientific questions. It is expected that a fundamental impact of this research will be more general acceptance and use of appropriate methods among astronomers. Second, the development of methods for efficient modeling of scientific phenomena, the comparison of complex models, and science-driven classification and clustering will help solve complex data analytic challenges throughout the natural, social, medical, and engineering sciences.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
1513492
Program Officer
Gabor Szekely
Project Start
Project End
Budget Start
2015-07-01
Budget End
2018-06-30
Support Year
Fiscal Year
2015
Total Cost
$87,499
Indirect Cost
Name
Harvard University
Department
Type
DUNS #
City
Cambridge
State
MA
Country
United States
Zip Code
02138