In a query-by-humming system, a user sings part of a melody and the computer identifies the songs that contain the melody. In a sign spotting system, a sign language user searches for occurrences of specific signs in a video database of sign language content. These are two example applications where users want to retrieve the best matching subsequences in a time series database given a query sequence. This project is developing methods for efficient subsequence matching in large time-series databases using the popular Dynamic Time Warping (DTW) distance measure. Embeddings are being designed that partially convert the subsequence matching problem into the much more manageable problem of similarity search in a vector space. This conversion allows leveraging the full arsenal of vector indexing and metric indexing methods for speeding up subsequence matching. The proposed methods will be applicable in a wide variety of time series domains, including, e.g., stock market modeling, seismic activity analysis, and sensor-based health monitoring. To showcase the commercial, social, and educational impact of the research, the project will produce three demonstration systems: a query-by-humming system, a handwritten document search-by-keyword system, and a sign spotting system. The results of the research are being integrated into these systems to achieve efficient retrieval in the presence of large amounts of data. The creation and dissemination of large, real-world datasets for these three systems will be an additional contribution of the project.

Project Report

In subsequence matching, given a specific sequence as input, we want to identify the best matching subsequences of (possibly long) sequences stored in a database. The sequences can be sequences of letters (for example, text, or DNA/protein sequences), sequences of numbers (such as closing stock market prices for each day of a year or a decade), sequences of images (movies), or sequences of notes (melodies). Since sequences can represent such diverse data, the subsequence matching problem is relevant for several interesting applications. An example is sign spotting. Here, the input sequence is a video of a specific sign from some sign language, e.g., American Sign Language (ASL). The database contains sign language videos. The goal is to identify spots in the database videos where the input sign occurs. Another example application is query-by-humming. Imagine having a melody in your mind, that you can't remember (or never knew in the first place) what song it is from. In a query-by-humming system, the user gives that melody as input to the system, by simply humming that melody on the computer's microphone. The system contains a large database of melodies, and it identifies the database melodies that contain parts similar to what the user hummed. In this project, our goal was to different design methods, sometimes more generally applicable, sometimes applicable to more specific types of data, that can improve the state-of-the-art in subsequence matching. Improving the state-of-the-art means improving accuracy and efficiency in such systems, so that the system identifies more often the correct results, and the system produces those results to the user faster. INTELLECTUAL MERIT: In this project, we have produced several novel methods that improve the state-of-the-art in subsequence matching. Here are some highlights: - In our TODS 2011 paper we describe a general method for speeding up subsequence matching. Our method is applicable in cases where similarity between sequences is measured using a method called "dynamic time warping" (DTW). DTW has been used successfully for several types of data, including gesture and sign language recognition, and query by humming. - In our PAMI 2009 paper, we describe a method that can be used for recognition and spotting of gestures and signs. Alternative methods for this problem typically assume that the system can accurately detect hand locations. Because of that, they often require users to wear long sleeves, and the background to be uniformly blue or black, so that hands can be detected accurately. Our method achieves good accuracy even when hand detection is relatively inaccurate, and thus does not make requirements such as long sleeves and simple backgrounds. - In our ICPR 2012 paper, we describe a computer vision method for detecting falls in videos captured using a depth camera (such as the popular Kinect camera from Microsoft). A fall detection system can be useful for monitoring elderly people living alone. - In a recently accepted TKDE paper, we describe a novel method for measuring similarity between sequences, called Move-Split-Merge (MSM) distance. In experiments we showed that, for several datasets, MSM produces more accurate results than other similarity measures that are commonly used. - In our PVLDB 2012 paper, we propose a theoretical method that can be used for speeding up subsequence matching and that is applicable not only to a specific distance measure (such as dynamic time warping) but to a broader group of distance measures. Methods that can be applied to broader sets of distance measures are of interest, because they reduce the need to come up with customized methods for each individual distance measure. - In still unpublished work, we have developed a way to speed up the search in query-by-humming systems. BROADER IMPACTS: As mentioned earlier, our theoretical methods are relevant for several real-world applications. Sign spotting is such an application that would be particularly useful to sign language users, teachers, and students. Fall detection is another useful real-world application, especially in the case of monitoring elderly people living alone. Query-by-humming is another example of an application that is of interest to the general public. Some of our more theoretical methods, like the MSM distance and the method of our PVLDB 2012 paper, while they have not been applied to specific and interesting real-world applications, have the potential to prove useful to such applications. That potential stems from the fact that they are generally applicable to diverse types of data and diverse notions of similarity. Last but not least, the project has involved more than ten students, both graduate and undergraduate, providing them with funding and opportunities to engage in research and obtain a solid background in computer science, data mining, and computer vision. Also, the project has produced significant amounts of data, and significant amounts of code, that we have made publicly available for researchers and educators to use.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
0812601
Program Officer
Maria Zemankova
Project Start
Project End
Budget Start
2008-09-01
Budget End
2012-08-31
Support Year
Fiscal Year
2008
Total Cost
$281,001
Indirect Cost
Name
University of Texas at Arlington
Department
Type
DUNS #
City
Arlington
State
TX
Country
United States
Zip Code
76019