In many scientific disciplines today, a community of users is working together to assemble, revise, and curate a shared data repository. As the community accumulates knowledge and the database content evolves over time, it may contain conflicting information and members can disagree on the information it should store. Relational database management systems (RDBMS) today can help these communities manage their shared data, but provide limited support for managing conflicting facts and conflicting opinions about the correctness of the stored data.

This project develops a Belief Database System (BeliefDB) that allows users to express belief annotations. These annotations can be positive (agreement) or negative (disagreement), and can be of higher order (belief annotations about other belief annotations). The approach allows users to have a structured discussion about the database content and annotations. A BeliefDB gives annotations a clearly defined semantics that lets a relational database understand and manage them efficiently.

Intellectual merits: (i) Definition of a Belief Database Model: The project develops a formalism that extends a relational database with belief annotations on data and on previously inserted annotations.

(ii) Design of a Belief Query Language: The project complements the data model with a new query language that extends SQL. (iii) Development of a canonical Belief Database Representation: The projects develops approaches to store and manipulate belief databases on top of a conventional RDBMS.

Broader impact: Curated databases and shared data repositories are becoming widespread in the scientific communities. A BeliefDB provides a new data management system that addresses the need of these communities to manage conflicting data. If successful, the project will be one of the pieces that will help data management technology undergo a new paradigm shift, from managing data as content, to supporting a community of users in collaboratively creating partly conflicting database contents.

For further information on the project see the project web page:: http://db.cs.washington.edu/beliefDB/

Project Report

Today's data management systems can no longer simply focus on managing clean data. They must also handle inaccurate, conflicting, and evolving data with extended user discussions and documentation. Large-scale shared data repositories are increasingly common in many sciences including astronomy, coastal and environmental sciences, earth sciences, coastal and environmental scientists, and many more. They can be also seen in non-scientific collaborative communities such as social networks, knowledge generating communities, even in web forums on role-playing games. The BeliefDB project developed fundamental techniques to enable users to understand, manage, and represent different possible belief worlds of what should be in the database and to have a structured argumentation over the database content and each other's annotations. The project had several outcomes, which can be classified in six categories: Representing and reconciling conflicting data. A framework called BeliefDB was developed under which every fact in a relational database can be annotated with a statement saying which agent believed that fact. An extension of a relational query language (e.g. SQL) can than specify which agents' beliefs are to be retrieved. Queries in this extended language can be entirely rewritten into traditional relational queries. A separate question is how to reconcile conflicting facts given a Web of trust mappings between users; it was shown that the complexity of reconciliation can range from polynomial time to NP-complete, depending on the type of conflicts that one wants to represent. Querying inconsistent data. A breakthrough theoretical result was obtained on the complexity of computing certain answers to queries over databases with key violations: it was shown that for every conjunctive query without self-joins where every relation has a unique key consisting of a single attribute, the complexity of computing all certain answers is either in polynomial time, or is co-NP-complete in the size of the database. Moreover, one can decide between these two cases by performing a static analysis on the query expressions. Explaining query outputs. A novel framework was developed for explaining observed answers to queries. In one application this technique was deployed to help users understand the performance behaviors of large, distributed systems. In a more general setting, the framework finds explanations to outputs to complex queries. Lower bounds on grounded probabilistic inference. New lower bounds were proven on the limitations of DPLL-based inference algorithm for computing queries over probabilistic databases. All modern model counting algorithm are based on variations of the DPLL backtracking procedure. It was shown that there exists First Order formulas (queries) whose grounded Boolean formulas causes any DPLL-based algorithm to run in exponential time, although lifted inference compute the probability of the query in polynomial time. This is the first formal proof of the additional power of lifted inference. Upper and lower Bounds for Boolean functions. New upper and lower bounds for the probability of positive Boolean functions were derived by treating multiple occurrences of variables as independent and assigning them new individual probabilities. This method can be used to approximate weighted model counting problems. As example application, it was shown how existing relational databases can be used to give upper and lower bounds to hard probabilistic queries. Query Interpretation. A novel visualization tool was developed that reduces the time needed to read and understand existing SQL queries. It thus enables effective query-reuse, a principal component in the vision of a Collaborative Query Management System where users start from existing query templates when composing new queries.

Project Start
Project End
Budget Start
2009-09-01
Budget End
2013-08-31
Support Year
Fiscal Year
2009
Total Cost
$508,000
Indirect Cost
Name
University of Washington
Department
Type
DUNS #
City
Seattle
State
WA
Country
United States
Zip Code
98195