Data and I/O availability is an increasing concern in todayÕs large data centers where both data volume and complexity are increasing dramatically. Most existing solutions are based on multi-replication techniques to provide data redundancy, where data chunks are replicated across storage server nodes. However, multi-replication techniques are insufficient to manage big data: itÕs a big challenge to efficiently replicate N copies of a data set of tens-to-hundreds of petabytes! As an alternative solution, erasure codes tolerating multiple failures can provide reliability and availability at much lower cost. However, the biggest challenge using erasure codes to manage big data is the performance problem due to the complex encoding/decoding operations, which limits the application of erasure codes in large-scale data centers.
This project develops cost effective techniques to exploit erasure codes to achieve high availability and enhance performance in large data centers to efficiently manage big data via several research innovations. This project cohesively investigates how to utilize proper spatial cost and system/architecture techniques to improve the overall data access performance of server clusters built upon erasure codes. This research has fundamental contributions to pave the way to efficiently deploy data centers using erasure codes. It has potential to benefit numerous big data applications such as online searching, social network, e-business, health care, and so on which are typically data intensive.