Data and I/O availability is an increasing concern in todayÕs large data centers where both data volume and complexity are increasing dramatically. Most existing solutions are based on multi-replication techniques to provide data redundancy, where data chunks are replicated across storage server nodes. However, multi-replication techniques are insufficient to manage big data: itÕs a big challenge to efficiently replicate N copies of a data set of tens-to-hundreds of petabytes! As an alternative solution, erasure codes tolerating multiple failures can provide reliability and availability at much lower cost. However, the biggest challenge using erasure codes to manage big data is the performance problem due to the complex encoding/decoding operations, which limits the application of erasure codes in large-scale data centers.

This project develops cost effective techniques to exploit erasure codes to achieve high availability and enhance performance in large data centers to efficiently manage big data via several research innovations. This project cohesively investigates how to utilize proper spatial cost and system/architecture techniques to improve the overall data access performance of server clusters built upon erasure codes. This research has fundamental contributions to pave the way to efficiently deploy data centers using erasure codes. It has potential to benefit numerous big data applications such as online searching, social network, e-business, health care, and so on which are typically data intensive.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Network Systems (CNS)
Type
Standard Grant (Standard)
Application #
1218960
Program Officer
Marilyn McClure
Project Start
Project End
Budget Start
2012-09-01
Budget End
2016-11-30
Support Year
Fiscal Year
2012
Total Cost
$431,984
Indirect Cost
Name
Virginia Commonwealth University
Department
Type
DUNS #
City
Richmond
State
VA
Country
United States
Zip Code
23298