Data integration systems provide a uniform access to a multitude of data sources. They have the potential to revolutionize the way we access data, and provide a basis on which to build even more advanced information processing architectures. However, today such systems are still extremely hard to build and costly to maintain. They must be told in tedious detail how to interact with data sources, and must be constantly modified to deal with changes at the sources. To address this problem, the project envisions building data integration systems that learn to evolve and self-manage over time, with minimal human intervention. To make fundamental contributions toward realizing this vision, the project employs database and artificial intelligence (especially machine learning) techniques to attack the following central challenges: (a) effectively automating key labor-intensive tasks, including schema matching, global schema creation, and duplicate detection, (b) detecting system failures due to changes at the sources, with minimal human intervention, and (c) further reducing the tremendous data integration burden of the system administrators by spreading the burden thinly over the mass of users.

The education plan leverages the research to prepare students and the broader community for the novel data management challenges raised by the Internet world. In terms of intellectual merit, the project takes a next logical step in data integration research. It brings conceptually novel solutions to fundamental issues underlying virtually any data integration or sharing efforts. The project results have the potential for autonomic-computing applications. In terms of broader impacts, the project will facilitate the widespread deployment of data integration systems, thus resulting in more effective information management and access for society. It plays an integral part in educating next-generation professional workers and researchers. The research will also help integrate data for rural Illinois fire fighters, and train them in access and use of the integrated information systems. The project information will be disseminated via publications, workshops, tutorials, and the Web site www.cs.wisc.edu/~anhai/projects/career.html that will include the resulting research results, data and system artifacts.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
0712836
Program Officer
Xiaoyang Wang
Project Start
Project End
Budget Start
2006-08-31
Budget End
2010-05-31
Support Year
Fiscal Year
2007
Total Cost
$238,115
Indirect Cost
Name
University of Wisconsin Madison
Department
Type
DUNS #
City
Madison
State
WI
Country
United States
Zip Code
53715