The goal of this project is to build Commugrate, a community-driven data integration system that capitalizes on the information gained from the interactions of communities of humans with data sources.
Commugrate tackles key challenges raised by the increase in the number of information sources used in science, engineering, and industry, as well as the need for large-scale data integration solutions to enable effective access to these sources. These challenges include schema matching and mapping, record-linkage, and data repair.
More specifically, Commugrate (i) utilizes both direct and indirect contributions from different types of human communities with a focus on the latter contributions, (ii) solves key data integration issues using new evidences like usage and behavior data which have not been previously used, (iii) adopts a new technique for schema matching, which defines a new class of its own, namely usage-based schema matching, (iv) introduces the first of its genre technique for record linkage based on entities' behavior, and (v) provides an adaptive feedback system to improve the quality of the data by making the best use of users feedback.
Commugrate has a broad impact across multiple segments of society as data integration is by far the most important and in the same time vexing issue in many areas in sciences, engineering, and industry. Furthermore, leveraging users' interactions with data sources, especially indirect interaction, may provide several benefits and help solve many intractable data integration tasks which cannot be done without human intervention.
PhD students will pursue research in this project. Publications, technical reports, software and experimental data from this research will be disseminated via the project web site at www.purdue.edu/cybercenter/commugrate.