CAREER:  A Scalable, Declarative, Imprecise Database Management System

Re, Christopher

Abstract

The unprecedented amounts of data available to individuals, companies, governments, and scientists promises to revolutionize the way entertainment, business, governance, and science operate. And while data are cheap and plentiful, much of this data is lower quality than the precise data that has been managed for the last 30 years. Building an application that processes this imprecise data is difficult: it requires that developers handle both standard data management challenges (e.g., concurrency and scalability), while at the same time coping with imprecise and incomplete data, which is typically done using statistical or machine learning techniques (e.g., interpolation and classification). The Hazy project addresses this challenge by building a system that integrates the paradigms of relational database management systems with statistical machine learning techniques. This project conducts the following major tasks: (I) designing a language to integrate these techniques with standard SQL, (II) proposing an algebra to implement this language along with support for automatic optimization (similar to a standard RDBMS), and (III) discovering techniques to efficiently maintain the statistical models as the underlying data are changed or updated. The end goal is a system that makes it as easy to develop scalable applications that use imprecise data as it is to develop their precise counterparts. Hazy allows users to process larger amounts of data with more sophisticated statistical processing than ever before. In turn, this enables new applications in a divese set of areas, such as life and physical science sensing applications, health-care and environmental monitoring, and enterprise-based and Web-based information extraction.

The research of this project is used to develop the data and infrastructure for new practicum-style courses that are under development at the University of Wisconsin-Madison. In addition, this infrastructure will be used as part of an outreach effort to enable high school students to gain access to data analysis tools. The source code of Hazy is released into open source and the results are disseminated on the project Web site (www.cs.wisc.edu/hazy/).

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Information and Intelligent Systems (IIS)
Application #: 1054009
Program Officer: Frank Olken

Project Start
Project End
Budget Start: 2011-05-01
Budget End: 2013-09-30
Support Year
Fiscal Year: 2010
Total Cost: $400,155
Indirect Cost

CAREER: A Scalable, Declarative, Imprecise Database Management System
Re, Christopher
University of Wisconsin Madison, Madison, WI, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments