When dealing with massive quantities of data, ranking and aggregate queries are powerful techniques for focusing attention on the most important answers. Many applications that produce such massive quantities of data inherently introduce uncertainty in the same time, for example, probabilistic match in data integration, imprecise measurements from sensors, fuzzy duplicates in data cleaning, inconsistency in scientific data. Hence, the importance of these queries is even greater in probabilistic data, where a relation can encode exponentially many possible worlds. Uncertainty opens the gate to many possible definitions for ranking and aggregate queries. This project systematically examines the underlying properties associated with the rich semantics of ranking and aggregate queries for large amounts of probabilistic data. More importantly, this project investigates the issue of how to design novel and scalable algorithms for processing these queries efficiently in various settings, such as the offline, centralized environment, distributed systems and the streaming model.

With the emergence of probabilistic data in many important application domains, the demand for understanding and processing the ranking and aggregate queries efficiently from the scientific community and beyond (e.g., government and military agencies) is expected to intensify in the coming years. The results of this project lay down a firm foundation for this important problem.

For further information, such as publications, data sets and source code, please see the project website at www.cs.fsu.edu/~lifeifei/rankaggprob

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
0916488
Program Officer
Gia-Loi Le Gruenwald
Project Start
Project End
Budget Start
2009-09-01
Budget End
2012-01-31
Support Year
Fiscal Year
2009
Total Cost
$328,831
Indirect Cost
Name
Florida State University
Department
Type
DUNS #
City
Tallahassee
State
FL
Country
United States
Zip Code
32306