This award is funded under the American Recovery and Reinvestment Act of 2009 (Public Law 111-5).

Interoperability of heterogeneous data is a critical problem faced by every modern enterprise that is concerned with data analysis, data migration, and data evolution. The fundamental goal in data interoperability is to facilitate and make transparent to end-users the extraction of information from multiple heterogeneous data sources that reside in different locations. At the heart of achieving data interoperability is the design and management of schema mappings. A schema mapping is a specification of the relationship between two database schemas. Schema mappings are the essential building blocks in specifying how data from different sources are to be integrated into a unified format or exchanged (i.e., translated) into a different format.

The intellectual merit of this project is the development of a solid foundation and a suite of techniques and tools for designing, understanding, and managing schema mappings. Earlier foundational work on schema mappings has mainly focused on the semantics and algorithmic issues of some of the basic operators for manipulating schema mappings with emphasis on the composition operator and the inverse operator. While the composition operator is well understood by now, much more remains to be done in the study of the inverse operator. One of the main goals of this project is to investigate in depth the inverse operator and also the difference operator, which remains largely unexplored to date. This project addresses several fundamental questions for the inverse and the difference operators, including the following: What is the right semantics for these two operators? What is the exact language for expressing these operators? Are there efficient algorithms for computing the result of the inverse operator and the difference operator? A parallel goal of this project is the development of a set of concepts and techniques for optimizing schema mapping and transforming more complex schema mappings into simpler, yet equivalent, ones. The final main goal of this project is to study the problem of using data examples to explain and illustrate schema mappings. The design of schema mappings between two schemas has been known to be one of the most costly and time-consuming tasks in achieving data interoperability. Prior studies have suggested that (familiar) data examples can be extremely powerful aids in designing schema mappings. This project addresses the following questions: What is the right notion or notions of illustrative data examples for schema mappings? How easy or difficult it is to compute small examples for illustrating schema mappings? How can one illustrate large and complex networks of schema mappings with data examples? How can one depict the similarities and differences among multiple schema mappings?

The broader impact of this project is the development of human resources in science and engineering through the teaching, mentoring, and research training of graduate and undergraduate students on the foundational and system development work of this project. Further information about publications, course material, and software prototypes and tools developed through this project can be found at the project web page http://datainterop.cs.ucsc.edu

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
0905276
Program Officer
Vijayalakshmi Atluri
Project Start
Project End
Budget Start
2009-07-01
Budget End
2013-09-30
Support Year
Fiscal Year
2009
Total Cost
$1,151,548
Indirect Cost
Name
University of California Santa Cruz
Department
Type
DUNS #
City
Santa Cruz
State
CA
Country
United States
Zip Code
95064