A Comparative Study of Approaches to Cluster- Based Large Scale Data Analysis

Naughton, Jeffrey

Abstract

This goal of this research project is to understand the tradeoffs between the MapReduce and parallel DBMS approaches to performing large-scale data analysis over large clusters of computers, and to bring together ideas from both communities. Both MapReduce and parallel database systems provide scalable data processing over hundreds to thousands of nodes. Both provide a stylized, high-level programming environment that allows users to efficiently filter and combine datasets while masking much of the complexity of parallelizing computation over a cluster. But they differ in substantial ways as well, such as their approaches to dealing with fault tolerance, their data modeling requirements, their query flexibility, and their ability to function in a heterogeneous processing environment.

This multi-university team of researchers is investigating the effect of these differences on the performance and scalability of these two approaches. The research team is running a set of experiments that compare an open source MapReduce implementation (Hadoop) to two commercial parallel database systems (DB2 and Vertica) on a benchmark that includes a range of tasks designed to assess the tradeoffs between both approaches. The research team is seeking to understand which differences between the two approaches to performing large scale data analysis are fundamental tradeoffs, and which differences are possible to combine inside a single solution, so that ideas from one community can benefit the other.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Information and Intelligent Systems (IIS)
Type: Standard Grant (Standard)
Application #: 0843487
Program Officer: Vijayalakshmi Atluri

Project Start
Project End
Budget Start: 2009-02-01
Budget End: 2012-01-31
Support Year
Fiscal Year: 2008
Total Cost: $109,506
Indirect Cost

A Comparative Study of Approaches to Cluster- Based Large Scale Data Analysis
Naughton, Jeffrey
University of Wisconsin Madison, Madison, WI, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments