Researchers and decision makers in diverse fields such as fraud detection, genome sequencing, and datacenter management need to process many terabytes of data every day. Many fields are turning to MapReduce systems to process such growing datasets. Consequently, the relatively young MapReduce ecosystem has to support complex workloads that include declarative queries for report generation, MapReduce programs for machine learning tasks, and large job workflows. Furthermore, elastic and pay-as-you-go cloud platforms pose novel challenges and opportunities for MapReduce workload management.

This project is building the Hadoop AutoAdmin system for automating MapReduce workload management. To the PI's knowledge, Hadoop AutoAdmin is the first system to address this challenging problem that will become increasingly important as a broad class of users adopt MapReduce. Hadoop AutoAdmin has three research thrusts. The first thrust is to understand and characterize the behavior of MapReduce workloads based on a comprehensive empirical study involving workloads and data from multiple application domains as well as different cluster configurations on the cloud. The second thrust is to develop an easy-to-use and efficient warehouse to store, retrieve, and visualize the diverse forms of workload monitoring data. The models and insights from these activities will drive the third thrust of developing end-to-end algorithms for workload management.

This project can have significant impact in areas of national importance like security and healthcare that are inundated with data. Hadoop AutoAdmin will improve worker productivity, system utilization, and cost-effectiveness of cloud platforms. The technical contributions will be disseminated broadly and the system released publicly.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Network Systems (CNS)
Type
Standard Grant (Standard)
Application #
1218981
Program Officer
M. Mimi McClure
Project Start
Project End
Budget Start
2012-09-01
Budget End
2016-08-31
Support Year
Fiscal Year
2012
Total Cost
$315,971
Indirect Cost
Name
Duke University
Department
Type
DUNS #
City
Durham
State
NC
Country
United States
Zip Code
27705