Explorative data analysis plays a key role in data-driven discovery in a wide range of domains including science, engineering and business. In order for data analysis to become a commodity during a period when their user base is continually expanding and diversifying, human productivity and ease-of-use must become first-class design considerations for any database system. Unfortunately, data tools that are user friendly and designed to improve human productivity are still sorely lacking. This project will enable users at different skill levels to interact with and explore their large datasets far easier and faster than they do today. Rather than spending a lot of precious time to build complex analytics tasks, this work will offer a more agile, responsive and user-friendly system based on direct manipulation of the visual representations (e.g., charts, graphs, maps) of the data sets and analysis results. The system can also be used as a learning tool: e.g., a teacher could walk students through a complex dataset to verify specific hypothesis. This project will make large-scale data exploration more accessible to more users. Overall, it will accelerate discovery and breakthroughs in many domains such as e-commerce, finance and science. This research will be incorporated in undergraduate and graduate coursework. The outreach activities include special research and education­focused programs that are geared towards undergraduates and high school girls.

This project will build a new class of database systems designed for Human-In-the-Loop (HIL) operation. The work targets an ever-growing set of data-centric applications in which users directly manipulate, analyze and explore large data sets, often using complex analytics and machine learning techniques. Traditional database technologies are ill suited to serve this purpose. Historically, databases assumed (1) text-based input (e.g., SQL) and output, (2) a point (i.e., stateless) query-response paradigm, (3) batch results, and (4) simple analytics. The project team will drop these fundamental assumptions and build a system that instead supports visual input and output, "conversational" interaction, early and progressive results, and complex analytics. Building a system that integrates these features requires a complete rethinking of the full data stack, from the visual interface to the "core", as well as incorporating pertinent algorithms. The primary research challenges revolve around developing algorithms and optimizations that leverage the unique characteristics of HIL workloads to speed up analysis over large data collections. The team will build a proof-of-concept HIL database called 20/20 that will tightly integrate and significantly extend two existing technologies built at Brown: PanoramicData is a touch and pen data visualization system and will serve as the front-end. The second building block is the Tupleware main-memory analytics system, which compiles complex analytics pipelines into executables. Tupleware will serve as the back­end analytics component. The team expects that the end result will offer a substantial speed­up (50% or more) over the state-of-the-art solutions for common analytics workloads. The project web site (http://database.cs.brown.edu/projects/20-20/) will include information on the project, publications, public datasets and code.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
1514491
Program Officer
Maria Zemankova
Project Start
Project End
Budget Start
2015-09-01
Budget End
2019-08-31
Support Year
Fiscal Year
2015
Total Cost
$1,000,000
Indirect Cost
Name
Brown University
Department
Type
DUNS #
City
Providence
State
RI
Country
United States
Zip Code
02912