With the exponential increase in the size, complexity, and rate of acquisition of diverse types of data, there an urgent need for new techniques for managing and analyzing such data. In this context, there is a critical need for benchmarks to facilitate evaluation of alternative solutions and provide for comparisons among different solution approaches targeted to big data applications. Benchmarks need to capture a variety of characteristics of big data storage, management, and analytics including new feature sets, enormous data size, largescale and evolving system configurations, shifting loads, and the heterogeneous technologies of big-data and cloud platforms. The benchmarks are inadequate for assessing emerging big data platforms, systems and in software such as SQL, NoSQL, and the Hadoop software ecosystem; different modalities or genres of big data, including graphs, streams, scientific data, document collections, and transaction data; new options in hardware including, HDD vs SSD, different types of HDD, SSD, and main memory, and large-memory systems; and, new platform options that include dedicated commodity clusters and cloud platforms.

The Workshop on Big Data Benchmarking 2012 represents an important step towards the development of a suite of benchmarks for providing objective measures of the effectiveness of hardware and software systems dealing with big data applications. The objective of this invitation-only workshop is to identify key issues and to launch an activity around the definition of reference benchmarks that can capture the essence of big data application scenarios. The effort aims to arrive at a set of objective measures and benchmark datasets to characterize and compare the performance of and the price/performance tradeoffs of alternative solutions for big data storage, retrieval, processing, and analysis problems. The workshop brings together a group of about 40 experts from academia and industry with backgrounds in big data, database systems, benchmarking and system performance, cloud storage and computing, and related areas.The industries represented range from hardware, software, analytics, and applications. The group will develop a draft of a report describing a big data benchmark suite that will be widely disseminated on the web and through presentations and oureach activities at the relevant conferences and workshops.

Broader Impacts: The availability of the big data benchmark suite will facilitate research and technological advances by providing objective measures for comparing alternative solutions to key big data problems.

Project Report

The First Workshop on Big Data Benchmarking (WBDB2012), held on May 8-9, 2012 in San Jose, CA, served as an incubator for several promising approaches to define a big data benchmark standard for industry. The meeting was attended by about 60 participants representing about 45 different organizations from industry and academia. Through an open forum for discussions on a number of issues related to big data benchmarking—including definitions of big data terms, benchmark processes and auditing —attendees were able to extend their own view of big data benchmarking as well as communicate their own ideas, which ultimately lead to the formation of small working groups to continue collaborative work in this area. Workshop attendees were selected based on their experience and expertise in the areas of management of big data, database systems, performance benchmarking, and big data applications. There was consensus among participants about both the need and the opportunity for defining benchmarks to capture the end-to-end aspects of big data applications. It was felt that big data benchmarks should also follow the model adopted by industry's existing Transaction Processing Performance Council (TPC) benchmarks, and not only include metrics for performance but also for price/performance, along with a sound foundation for fair comparison through audit mechanisms. Additionally, the benchmarks should consider several costs relevant to big data systems including total cost of acquisition, setup cost, and the total cost of ownership, including energy cost. The first WBDB workshop has been followed by a second workshop held in December 2012 in Pune, India, and a third workshop held in July 2013 in Xi’an, China.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
1241838
Program Officer
Sylvia Spengler
Project Start
Project End
Budget Start
2012-06-01
Budget End
2013-05-31
Support Year
Fiscal Year
2012
Total Cost
$15,000
Indirect Cost
Name
University of California San Diego
Department
Type
DUNS #
City
La Jolla
State
CA
Country
United States
Zip Code
92093