Information is one of the biggest assets for most enterprises. In today's information age, almost every enterprise decision is based on a detailed analysis of data recorded in diverse sources ranging from structured databases to the World Wide Web. To ensure that data retrieved from different sources is used appropriately and within context, it is imperative that the provenance of the data be recorded and made available to its users. Provenance refers to the knowledge that enables a piece of data be interpreted correctly. It is the essential ingredient that ensures that users of data (for whom the data may or may not have been originally intended) understand the background of the data. This includes elements such as, who (person) or what (process) created the data, where it came from, how it was transformed, the assumptions made in generating it, and the processes used to modify it. This research team will investigate the semantics of data provenance and will develop an ontology to represent the semantics of data provenance, including the development of ways to automate the capture of provenance. Using new product design and development as the real world domain, a partnership will be formed with a large defense contracting company, viz., Raytheon Missile Systems, located in Tucson, Arizona, to investigate these research issues. A testbed will be created to capture and use provenance and evaluate the system's utility using a well defined set of metrics. Raytheon has committed considerable resources in the form of personnel and access to software as needed for this research.

The intellectual merit of this proposal stems from the theoretical framework for understanding and representing the semantics of data provenance. This is considerably different from existing work on provenance which has mainly explored the 'where' and 'why' of provenance. This work will pave the way for understanding the extent to which provenance can be automatically captured.

The project has the potential for broader impacts on society. Most importantly, the development of techniques to represent, capture and deploy provenance has the potential to revolutionize the Department of Defense product development industry and other domains as well. The ultimate goal is to enable the development of autonomic and interoperable enterprise data management systems.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
0455993
Program Officer
Lawrence Brandt
Project Start
Project End
Budget Start
2005-05-01
Budget End
2008-04-30
Support Year
Fiscal Year
2004
Total Cost
$244,404
Indirect Cost
Name
University of Arizona
Department
Type
DUNS #
City
Tucson
State
AZ
Country
United States
Zip Code
85721