BIGDATA: Small: DCM: Collaborative Research: An efficient, versatile, scalable, and portable storage system for scientific data containers

Tao, Jian; Benger, Werner; Benger, Werner; Li, Xin

Abstract

Scientific Bigdata sets are becoming too large and complex to fit in RAM, forcing scientific applications to perform a lot of slow disk and network I/O. This growth also makes scientific data more vulnerable to corruptions due to crashes and human errors. This project will use recent results from algorithms, database, and storage research to improve the performance and reliability of standard scientific data formats. This will make scientific research cheaper, faster, more reliable, and more reproducible.

The Hierarchical Data Format (HDF5) standard is a container format for scientific data. It allows scientists to define and store complex data structures inside HDF5 files. Unfortunately, the current standard forces users to store all data objects and their meta-data properties inside one large physical file; this mix hinders meta-data-specific optimizations. The current storage also uses data-structures that scale poorly for large data. Lastly, the current model lacks snapshot support, important for recovery from errors.

A new HDF5 release allows users to create more versatile storage plugins to control storage policies on each object and attribute. This project is developing support for snapshots in HDF5, designing new data structures and algorithms to scale HDF5 data access on modern storage devices to Bigdata. The project is designing several new HDF5 drivers: mapping objects to a Linux file system; storing objects in a database; and accessing data objects on remote Web servers. These improvements are evaluated using large-scale visualization applications with Bigdata, stemming from real-world scientific computations.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Information and Intelligent Systems (IIS)
Type: Standard Grant (Standard)
Application #: 1251095
Program Officer: Sylvia Spengler

Project Start
Project End
Budget Start: 2013-07-01
Budget End: 2017-04-30
Support Year
Fiscal Year: 2012
Total Cost: $150,000
Indirect Cost

BIGDATA: Small: DCM: Collaborative Research: An efficient, versatile, scalable, and portable storage system for scientific data containers
Tao, Jian Benger, Werner Benger, Werner Li, Xin
Louisiana State University, Baton Rouge, LA, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments