The goal of this project is to develop theory and implementation foundations for VGRAM, a technique that uses variable-length, high-quality grams from a collection of strings to support approximate queries on the collection. The research plan includes four tasks: 1)developing methods for VGRAM to decide an optimal set of grams automatically without requiring user-defined parameters, 2)integrating VGRAM into relational database management systems for adoption, 3) using VGRAM to support approximate keyword search in documents, and 4) evaluating VGRAM using two real applications, one for integrating Web information about family reunification and one for integrating medical information.
The research results will have significant impacts on society as approximate string queries are needed in many applications, such as data integration and record linkage. This project supports two PhD students to pursue research in the areas of text retrieval and database systems. Publications, technical reports, software and experimental data from this project will be disseminated via the project web site (http://flamingo.ics.uci.edu/).