Prior research on data integration led to XML mediators, which provide to applications a virtual integrated XML view that allows a single point of access to the data of multiple distributed and heterogeneous information sources. Consequent industrial products address Enterprise Information Integration and follow the Global-As-View paradigm, where the owner specifies the integrated view as an XQuery function of the source views.

The emergence of Service-Oriented Architectures and large scale Internet-based data integration, which is often needed in the science domains, renders Global-As-View-based information insufficient to benefit from the emerging opportunities and inefficient in large scale environments. Service-Oriented Architectures provide live data access at data sets by offering a set of web service calls to them, as opposed to (the studied in prior works) full query access to the data sets. Large scale data integration highlights the deployment, development and maintenance bottleneck that the Global-As-View paradigm creates by requiring the integrated view owner to have knowledge of the schema, data formats and semantics of all sources.

The proposed mediator provides a scalable solution by employing a new Global-Local-As-View paradigm in an XML setting: Each source owner can become responsible for fitting her source data and web services to the integrated view.

Fundamental algorithmic and system innovations are needed for the implementation of the paradigm. First, queries over the integrated view must be rewritten to use the source services and data. Existing algorithms for rewriting relational queries over relational views are not sufficient, as they do not address services and the rich structure of XML queries. Second, the source owner needs an appropriate language and visual tools to export interfaces in a way that can range from full access to the underlying database to a few parameterized queries and anywhere inbetween. Finally, the client needs corresponding systems to comprehend which XQueries can be asked on the integrated view.

The resolution of the above fundamental contributions will enable the development of the rewriting module of the UCSD-XMED XQuery mediator, which can become a valuable information sharing and publishing component, especially for science portals where scientists will connect their databases of experimental data to XML-based portals for integrated access to scientific information. The UCSD-XMED query processor will be used by graduate and undergraduate students in database and middleware technologies.

Updates about the project, including its people, software distributions and publications are available at http://db.ucsd.edu/NSF07xmlMed/

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
0713672
Program Officer
Vijayalakshmi Atluri
Project Start
Project End
Budget Start
2007-09-15
Budget End
2011-08-31
Support Year
Fiscal Year
2007
Total Cost
$450,000
Indirect Cost
Name
University of California San Diego
Department
Type
DUNS #
City
La Jolla
State
CA
Country
United States
Zip Code
92093