This Small Grant for Exploratory Research (SGER) project will extend data integration technologies in partnership with two Texas State agencies and the City of Gainesville TX on the topic of legacy criminal justice databases.
Technically the project will examine (1) architectural solutions to constructing mediated databases, and (2) new methods for automating schema matching. More specifically,the research will demonstrate the feasibility of representing the mappings between the virtual global database and the various data sources as XQuery queries. XQuery should provide efficiencies in processing the data, ability to describe the data mappings, and potential for improving standardization. The project's second task will address the schema matching problem among heterogeneous databases, investigating severalprocedures based on techniques already successfully employed in Computational Linguistics.
Intellectual Merit
This research is on integrated information technology related to the development of metadata and the use of automation of schema matching by means of linguistic based techniques. The work of the proposed study will be vertically integrated, with a significant amount of effort devoted to research on the principles underlying the proposed projects, development of practical technology based on the research, and demonstrations and test beds that illustrate the technology. The major goal of the proposed research will be the creation of a prototype information system for integrating legacy criminal justice information within the State of Texas. The effectiveness of the prototype will then be assessed and results of the evaluations will be used to develop generalpurpose approaches to data assimilation and schema integration.
Broader Impact
The proposed research will benefit the State of Texas by helping counties transmit legacy criminal justice data to the State's information system. By the end of the proposed research, the team will complete: (1) a prototype architecture and a set of specifications that support data integration of justice information; and (2) a set of tools and techniques for automated schema matching. Beyond the direct involvement of graduate students conducting research towards their theses and dissertations, the project will develop artifacts for courses in the areas of database, natural language processing, and machine learning. The research will involve undergraduate students and members of the nearby community.