Information is one of the biggest assets for most enterprises. In today's information age, almost every enterprise decision is based on a detailed analysis of data recorded in diverse sources ranging from structured databases to the World Wide Web. To ensure that data retrieved from different sources is used appropriately and within context, it is imperative that the provenance of the data be recorded and made available to its users. Provenance refers to the knowledge that enables a piece of data be interpreted correctly. It is the essential ingredient that ensures that users of data (for whom the data may or may not have been originally intended) understand the background of the data. This includes elements such as, who (person) or what (process) created the data, where it came from, how it was transformed, the assumptions made in generating it, and the processes used to modify it. This research team will investigate the semantics of data provenance and will develop an ontology to represent the semantics of data provenance, including the development of ways to automate the capture of provenance. Using new product design and development as the real world domain, a partnership will be formed with a large defense contracting company, viz., Raytheon Missile Systems, located in Tucson, Arizona, to investigate these research issues. A testbed will be created to capture and use provenance and evaluate the system's utility using a well defined set of metrics. Raytheon has committed considerable resources in the form of personnel and access to software as needed for this research.
The intellectual merit of this proposal stems from the theoretical framework for understanding and representing the semantics of data provenance. This is considerably different from existing work on provenance which has mainly explored the 'where' and 'why' of provenance. This work will pave the way for understanding the extent to which provenance can be automatically captured.
The project has the potential for broader impacts on society. Most importantly, the development of techniques to represent, capture and deploy provenance has the potential to revolutionize the Department of Defense product development industry and other domains as well. The ultimate goal is to enable the development of autonomic and interoperable enterprise data management systems.