Buneman, Peter University of Pennsylvania $174,951
DLI Phase 2 - DATA PROVENANCE
This project will address issues associated with data provenance. Provenance is concerned with how information has arrived at the form in which appears -who produced it, who has corrected it, how old it is, it was originally produced, and so forth . Understanding provenance has occupied scientists, historians, textual critics and other scholars for centuries.
The provenance of data in databases is a newer and larger problem, because one is interested in data at all levels of granularity - from a single pixel in a digital image to a whole database. Just as scholars comment on documents by attaching annotations (marginalia) to text, part of the solution to recording provenance is the attachment of annotations to components of databases. Database researchers have recently considered loosely structured forms of data and have developed software systems for querying and storing such data. This work is closely related to new formats that have been developed for structured documents on the Web. It is expected that this technology will provide the substrate for recording and tracking provenance by advancing new data models, new query languages and new storage techniques.