Two major challenges are faced by computational scientists who routinely generate big data sets. The first is deciding what data are the most essential for analysis, given that only a small fraction of them can be retained. The second is transforming these data into information that conveys the most insight. As the size of simulation output continues to grow, the "save the data first, analyze them later" approach needs to be completely replaced with more aggressive data prioritization and reduction before any analysis can be done. In this project, core data analytics technologies are developed to facilitate effective data summarization, indexing, and triage for large-scale flow data. Fluid flow plays an important role in explaining many phenomena across a wide range of disciplines. To provide the scientists with a succinct view of the data content, and also organize the data and features based on their similarity and complexity, a graph-based model is developed to simultaneously reveal the major structure of the flow field, and to facilitate high performance and out-of-core flow line computation. We develop statistical and geometrical complexity measures for the flow lines to efficiently group and prioritize sub-regions in the vector field to allow efficient data access. To characterize the temporal complexity of flow fields, we develop time-varying analysis algorithms that allow for more detailed analysis of the data, and provide the user with flexible interface to quickly identify salient features.

The development of the proposed integrated flow analysis and visualization framework initially targeted two applications, simulations of turbo machinery in aerodynamics, and study of Madden Julian Oscillation in climate modeling. As typical flow in turbo machinery is full of evolving shocks and vortical structures, visualization allows the designers to identify loss regions and complex flow features in a relatively short amount of time if these features can be identified automatically. To understand the phenomenon of Madden Julian Oscillation, as this phenomenon is strongly related to the convection of air, the flow analysis techniques developed under this project can be used to identify and track its locations and durations. Because the size of data generated by time-varying simulations can be prohibitively large, the proposed time-varying data reduction techniques allow scientists to focus on the most salient portion of the data. The key impact of this project is to make available a working and attractive solution to assist scientists to comprehend the vast amount of data generated by large-scale simulations. Through close collaboration with application scientists, the research ideas developed in this project into will be transformed into an open source software framework.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
1250752
Program Officer
Almadena Chtchelkanova
Project Start
Project End
Budget Start
2013-09-15
Budget End
2018-08-31
Support Year
Fiscal Year
2012
Total Cost
$727,258
Indirect Cost
Name
Ohio State University
Department
Type
DUNS #
City
Columbus
State
OH
Country
United States
Zip Code
43210