Astrophysical simulation codes, as well as astrophysical data, have grown increasingly complex: simulations are now able to probe the formation of stars, galaxies, black holes and many other disparate phenomena. However, as simulations become larger, and the physical models governing those simulations become more complex, the inherent difficulties in developing simulation codes and analyzing the data output from these simulations grow commensurately. Many different, often competing, groups utilize independent simulation platforms, which prevents substantial collaboration as a result of technical incompatibilities. To mitigate this fragmentation, this research involves the creation of an Integrated Science Environment for both astrophysical computation and visualization. This Integrated Science Environment is designed to work equally well for new users as well as for petascale simulations on platforms such as the NSF-funded Blue Waters.

The Integrated Science Environment produced by this research is constructed out of three primary components: a simulation platform interface, an initial conditions generator, and an analysis and visualization engine. The simulation platform interface abstracts the internal data structures and unit-handling of individual simulation platforms into physically-relevant quantities, providing a compatibility layer enabling microphysical solvers (such as chemistry, radiative cooling, and hydrodynamics) to be applied to multiple platforms unmodified. The initial conditions generator creates the starting points for astrophysical simulations, enabling both intuitive initial conditions generation and straightforward cross-code comparison and validation of results. Finally, the analysis engine produces high-quality quantitative results and visualizations, including planetarium-quality visualization tools. The Integrated Science Environment is fully MPI-parallelized and is written primarily in Python with APIs for in situ or concurrent analysis and visualization to be conducted during the course of a simulation, scaling up to hundreds of thousands of processors.

Project Report

Understanding astrophysical phenomena -- such as how stars form, the lifetimes of galaxies, and how supernovae explode -- requires conducting complex, multi-component computer simulations. These simulations are conducted on computer platforms spanning the range of single processor laptops all the way to the largest supercomputers in the world, where several hundred thousand processors work in concert to compute the interplay of chemistry, hydrodynamics, radiative transfer and gravity. These simulations are built on computational models based on our understanding of physics at small scales, and they provide outputs that help us understand and contextualize telescope observations. Rather than a single method of conducting these simulations, however, researchers often utilize different computational platforms, developing independent tools to verify the results of other simulations. Each of these platforms brings with it different underlying assumptions about how data should be stored, the mechanisms for accessing that data, and even the types of data that is stored during the course of a simulation. This results in something of a conceptual language barrier, which prevents researchers from directly comparing results and introduces a large obstacle to early-stage researchers transitioning between research groups and code platforms. Furthermore, there was not a single target method of producing analysis routines that would be applicable across all simulation codes. The efficiency of researchers to conduct, process and understand data from simulations is dependent on their ability to describe the scientific questions they need to ask in meaningful ways, without requiring them to translate these to the underlying assumptions of data storage that are present in specific simulation codes. The open source analysis and visualization toolkit "yt" was originally designed to provide a mechanism for "asking questions of data" for one particular simulation platform, the Enzo code. Instead of describing file opening, conversion of units, data selection and processing in detailed, fine-grained ways, it provided a high-level mechanism of describing data in terms of physical meaning and with a minimum of technical overhead. However, during the course of this project yt was further developed to apply to many different simulation code formats. This has resulted in a dramatic increase in collaboration between scientists, as rather than targeting individual simulation data formats, they can simply target yt when developing analysis modules. For instance, this has included simulating observations from telescopes, advanced methods of visualization, and even developing tools for creating scientific communication for use at planetariums. The yt toolkit has been deployed on computing platforms ranging from the very small (laptop) to the very, very large (supercomputer). This has directly increased research productivity, reducing the duplication of effort between research groups and increases collaboration between scientists. In the course of this development, yt has also been extended to generate initial conditions for simulations, so that new problems can be easily explored. Furthermore, yt has developed capabilities for conducting actual simulations, as repeatable experiments; chemical rate solvers can be conducted directly on data in memory, allowing for experimentation and exploration of data without the full cost of a simulation. As computational power increases, the commensurate increase in both the rate at which data is generated and the volume of that data has been difficult to manage. Computer simulations generate vastly more data than can be stored; rather than storing this data and analyzing it after the simulation has been conducted, in many cases it is now necessary to store only a small subset of the data generated and to instead conduct the majority of analysis while the simulation is running, without writing data to disk. yt has been instrumented to receive data from a running simulation, so that the data generated during a simulation can be analyzed utilizing the exact same techniques, tools and in many cases scripts as are used for post-processing. This will enable researchers to truly address future generations of simulations, without reducing their efficiency as data sizes and complexity grow much larger. Finally, driving forward all of these developments are fundamental questions that can at present only be answered through simulations. How do galaxies form, grow and evolve over cosmic time? How do the first stars and galaxies form? Can we find remnants of these stars and galaxies in our local Universe today? During the course of this project, we investigated the magnetic field amplification of the first stars, the enrichment of the neighborhoods of the first galaxies with heavy elements forged in their supernovae, and the birth and growth of galaxies. These results will help us understand low-metallicity stars in the galaxy, the observations of next generation telescopes, and the distribution of elements surrounding galaxies.

Agency
National Science Foundation (NSF)
Institute
Division of Advanced CyberInfrastructure (ACI)
Type
Standard Grant (Standard)
Application #
1048505
Program Officer
Sushil K Prasad
Project Start
Project End
Budget Start
2011-01-01
Budget End
2013-12-31
Support Year
Fiscal Year
2010
Total Cost
$240,000
Indirect Cost
Name
Turk Matthew J
Department
Type
DUNS #
City
La Jolla
State
CA
Country
United States
Zip Code
92037