High-fidelity scientific simulations of natural phenomena, such as massive star explosion, hurricane movement, or fusion energy production increasingly become not only compute-intensive but also data-intensive. "Big" data produced by large-scale simulations calls for sustainable data management solutions that increase energy efficiency, enable predictive data analytics, and reduce time-to-discovery. In response to this critical need, Adaptive Mesh Refinement (AMR) has emerged as a promising technology to achieve considerable savings in memory, computation, and storage resources, while maintaining or even increasing simulation accuracy. With AMR, scientists can adaptively balance between high-resolution look into "the unusual" areas and low-precision glance at "the usual" areas of the simulated phenomena.

The strength of AMR selective fine-grain refinement of salient areas of the simulationis also becoming its weakness. The non-uniform structure of its underlying mesh leads to inefficient post-simulation access to the "Big" data along the path from data analytics to scientific discovery. The data generation process of space-time simulation proceeds from one time step to the next and requires the context of only two time steps, while storing data for only one time step on the disk. In contrast, visual data analytics often requires the full context of the available data, not just a single time step. In fact, simulations that are driven by local space-time relationships are largely performed with the purpose of discovering or explaining non-local and large-scale space-time relationships through interactive "what-if" data exploration. Thus, the fundamental differences in data context and heterogeneity of access patterns call for analytics-driven scientific data management solutions.

To close this gap, this project aims to significantly advance database management for "Big" AMR data by making data analytics and data reduction the first class citizens of the database design and query processing. To support analytics-driven efficient query processing, it offers a transformative shift from the traditional indexing of data to the proposed indexing of information about data compression and hierarchical data layout in storage. The former incurs substantial storage overhead, while the latter is anticipated to be storage-friendly with significantly faster data retrieval in response to user queries.

To realize this ambitious goal, innovations are proposed in three broad areas of data-intensive computer science: semantic modeling, scientific database management, and high performance storage. Specifically, the scientific database management is going to be cognizant of rich data access-and-query semantics specific to AMR data, capable of in situ data compression and indexing during data generation by the simulation, and optimized for heterogeneous access patterns through novel data layout methodologies.

This research is being conducted in a close partnership with the three national labs' research teams with pioneering technologies in high performance storage, AMR, and high-fidelity climate change simulations. The approaches and formalisms developed in this research are expected to be applicable to a broad range of scientific and engineering problems that utilize simulation models for predictive understanding of physical processes through "Big" data analytics. The techniques developed under this project are expected to accelerate the crucial process of data-driven exploration and knowledge discovery in the scientific workflow. This project also plans significant efforts in "Big" data education, diversity, community engagement, and the dissemination of results through academic publications and open-source software.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Communication Foundations (CCF)
Type
Standard Grant (Standard)
Application #
1240682
Program Officer
Almadena Chtchelkanova
Project Start
Project End
Budget Start
2012-05-01
Budget End
2015-12-31
Support Year
Fiscal Year
2012
Total Cost
$149,999
Indirect Cost
Name
North Carolina State University Raleigh
Department
Type
DUNS #
City
Raleigh
State
NC
Country
United States
Zip Code
27695