Data are increasingly generated, stored, and processed distributively. Meanwhile, when large amounts of data are generated, ambiguity, uncertainty, and errors are inherently introduced, especially in a distributed setup. It is best to represent such data in a distributed probabilistic database. In distributed data management, summary queries are useful tools for obtaining the most important answers from massive quantities of data effectively and efficiently, e.g., top-k queries, heavy hitters (aka frequent items), histograms and wavelets, threshold monitoring queries, etc. This project investigates novel query processing techniques for various, important summary queries in distributed probabilistic data.

Broadly classified, this project examines both snapshot summary queries in static (i.e., no updates) distributed probabilistic databases, and continuous summary queries in dynamic (i.e., with updates) distributed probabilistic databases. A number of techniques are explored to design novel, communication and computation efficient algorithms for processing these queries.

A distributed probabilistic data management system (DPDMS) prototype is implemented based on the query processing techniques developed in this project. This DPDMS is released to and used in practice by scientists and engineers from other science disciplines as well as industry.

Graduate and undergraduate students, including those from minority groups, are actively involved in this project. Findings from the project have been integrated into different courses, demos, and educational projects. For further information, such as publications, data sets, source code, and education initiatives, please visit the project website at www.cs.fsu.edu/~lifeifei/dpdm.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
1053979
Program Officer
Sylvia Spengler
Project Start
Project End
Budget Start
2011-02-01
Budget End
2011-11-30
Support Year
Fiscal Year
2010
Total Cost
$102,579
Indirect Cost
Name
Florida State University
Department
Type
DUNS #
City
Tallahassee
State
FL
Country
United States
Zip Code
32306