Uncertain or incomplete data abounds in many database applications. For example, data received from a scientific instrument may contain measurement errors, market forecasts may identify several feasible scenarios, medical diagnostic test results may indicate several probable illnesses for the patient, and so on. However, most existing database systems treat all data as "correct", possibly leading to invalid conclusions. This project investigates management of uncertain data. In particular, a new probabilistic data model is developed (as an extension to the relational model), where probabilities or confidences can be associated with the values of attributes. These probabilities are given in the form of a discrete probability distribution function where the function may be incompletely specified. It is believed that such model provides a natural formalism for describing the uncertain real world. Operators are developed for manipulating probabilistic data. They include operators that are analogous to conventional relational operators, as well as new operators for dealing with probabilities (e.g., combining two distributions for the same attribute). The model and the operators are validated using real world examples. A prototype database management system that uses the probabilistic data model is designed to serve as a basis for further research and to be implemented in future years.

Project Start
Project End
Budget Start
1989-09-01
Budget End
1992-02-28
Support Year
Fiscal Year
1989
Total Cost
$60,000
Indirect Cost
Name
Princeton University
Department
Type
DUNS #
City
Princeton
State
NJ
Country
United States
Zip Code
08540