Cutting-edge science has become extremely data-intensive, and therefore the inability to store and analyze multi-Terabyte datasets today (and Petabytes tomorrow) is increasingly limiting scientific progress. One important, long-standing problem, described by Feynman as the last unsolved problem in classical physics, is turbulence in magnetized and unmagnetized fluids. A formidable obstacle is that the intrinsic physics of the turbulent cascade takes place in a Lagrangian frame, so that fluid particles must be tracked backward in time. This requirement holds true in other fields, such as cosmogenesis, where galaxies need to be tracked back through past encounters.
The present research project will develop a new approach to large-scale numerical simulations, saving almost every time step in a database and thus permitting subsequent analysis by relatively simple computation, only over a lot of data. The new idea, which has already been tested, is to avoid the bottleneck of moving data to compute nodes, and instead combine low-level, indexed data access with processing that occurs right inside the database. This method permits the intended novel assault on the problem of turbulence, combining a forefront science project with a demonstration of the simulation environment of the future. It requires the development of tools and procedures to organize the data in multiple formats, each appropriate for a particular kind of statistical analysis, running either inside or closely integrated to the database.
The resultant improved understanding of magneto-hydrodynamic turbulence will have a broad impact throughout astrophysics; and the new approach and concrete tools to be created will affect most areas of science.