Scientific Bigdata sets are becoming too large and complex to fit in RAM, forcing scientific applications to perform a lot of slow disk and network I/O. This growth also makes scientific data more vulnerable to corruptions due to crashes and human errors. This project will use recent results from algorithms, database, and storage research to improve the performance and reliability of standard scientific data formats. This will make scientific research cheaper, faster, more reliable, and more reproducible.

The Hierarchical Data Format (HDF5) standard is a container format for scientific data. It allows scientists to define and store complex data structures inside HDF5 files. Unfortunately, the current standard forces users to store all data objects and their meta-data properties inside one large physical file; this mix hinders meta-data-specific optimizations. The current storage also uses data-structures that scale poorly for large data. Lastly, the current model lacks snapshot support, important for recovery from errors.

A new HDF5 release allows users to create more versatile storage plugins to control storage policies on each object and attribute. This project is developing support for snapshots in HDF5, designing new data structures and algorithms to scale HDF5 data access on modern storage devices to Bigdata. The project is designing several new HDF5 drivers: mapping objects to a Linux file system; storing objects in a database; and accessing data objects on remote Web servers. These improvements are evaluated using large-scale visualization applications with Bigdata, stemming from real-world scientific computations.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
1251095
Program Officer
Sylvia Spengler
Project Start
Project End
Budget Start
2013-07-01
Budget End
2017-04-30
Support Year
Fiscal Year
2012
Total Cost
$150,000
Indirect Cost
Name
Louisiana State University
Department
Type
DUNS #
City
Baton Rouge
State
LA
Country
United States
Zip Code
70803