III: Medium: Collaborative Research: U4U - Taming Uncertainty with Uncertainty-Annotated Databases

Glavic, Boris

Abstract

Uncertainty is prevalent in data analysis, no matter what the size of the data, the application domain, or type of analysis. Common sources of uncertainty include missing values, sensor errors, bias, outliers, and many other factors. Classical deterministic data management does not track uncertainty and, thus requires data quality issues to be resolved before data is ingested into the system, which is often not feasible. The net effect is that inherently uncertain data is being treated as certain. However, if ignored, data uncertainty results in hard to trace errors, which in turn can have severe real world implications such as unfounded scientific discoveries, financial damages, or even medical decisions based on incorrect data. While there exist techniques for managing incomplete data, these techniques are generally too heavy-weight for real-world usage and may hide relevant information from users. The goal of this project is to develop light-weight techniques for managing uncertain data that empower a wide range of applications to manage uncertainty.

Current methods for managing uncertain data are often computationally expensive and are only applicable to limited types of queries. The planned research will result in novel methods for managing uncertain data that bridge the gap between deterministic and incomplete data management. The foundation of this project are uncertainty-annotated databases, which enrich data with uncertainty labels and provide semantics for propagating these labels through queries. The result is a strict generalization of classical data management that combines the performance, generality, and ease-of-use of deterministic data management with the strong correctness guarantees of incomplete database techniques. Achieving this goal is highly non-trivial, because query evaluation over uncertain data is intractable, even for relatively simple uncertain data models and restricted classes of queries. Three main research thrusts will be explored that address the main challenges in developing such a technique: (i) uncertainty-annotated databases will be extended with attribute-level annotations and an compact encoding of an over-approximation of possible answers. This enables the approach to handle missing data and to deal with non-monotone queries such as queries with aggregation; (ii) methods to compactly approximating incomplete databases will be developed to deal with the large or even infinite sets of possible results produced by queries over uncertain data; (iii) optimized algorithms for query evaluation over uncertainty-annotated databases will be developed to address the performance limitations of queries over uncertain data. The planned work will significantly enhance the state-of-the-art in uncertain data management by, for the first time, enabling principled uncertainty management for complex queries at a reasonable cost.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Information and Intelligent Systems (IIS)
Type: Standard Grant (Standard)
Application #: 1956123
Program Officer: Wei-Shinn Ku

Project Start
Project End
Budget Start: 2020-10-01
Budget End: 2024-09-30
Support Year
Fiscal Year: 2019
Total Cost: $466,569
Indirect Cost

III: Medium: Collaborative Research: U4U - Taming Uncertainty with Uncertainty-Annotated Databases
Glavic, Boris
Illinois Institute of Technology, Chicago, IL, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments