Data and knowledge integration are costly processes. Consequently, most existing solutions rely on a one-size-fits-all approach, where the data are integrated upfront and then the integrated data or knowledge-bases are used as is. Such snapshot-based integration solutions, however, cannot be effectively applied when the data sources are autonomous and dynamic or when, as in most scientific and decision making applications, assumptions, beliefs, and knowledge of the domain experts are indispensable to the integration process.
The proposed work tackles the computational challenges underlying a user driven integration (UDI) system, keeping in mind the human constraints and challenges that underlie the technical considerations. The key technical and intellectual impacts are in algorithms and data structures that can help bridge the semantic gap between the expert user and the system through a user-driven integration process based on individual user feedback. The team will specifically investigate (a) continuously revisable data/metadata alignment through vector space embeddings and probabilistic and generative models and (b) algorithms for query processing and candidate enumeration to support feedback over graph-based models of data with alternative interpretations.
UDI has potential applications to many domains (such as science and business intelligence) that need user-driven integration to answer key questions over diverse data sets. In particular,UDI will be incorporated into the NSF-funded tDAR (the Digital Archaeological Record), which has the potential to transform archaeology?s scientific endeavors by enormously advancing the capacity for synthetic research. The investigation of fundamental information integration challenges will thus contribute substantially to a shared infrastructure of science and will enable crucial transdisciplinary research concerning complex systems.
Participation in this research by the computer science graduate students will prepare them to function effectively in multidisciplinary teams and enhance their appreciation of the associated challenges and opportunities. Use of UDI as a testbed will enable these students to experiment in scientific information management, thereby increasing their awareness of data integration and science-informatics issues. UDI We expect two graduate courses to leverage the data sets as well as the project software as an educational platform. Arizona State University also recruits top quality undergraduates through a nationally recognized residential Honors College and the Minority Access to Research Careers program and the project will involve undergraduate honors students to participate in the project. UDI will also serve as a testbed for undergraduate students through Capstone Projects.