Applications as diverse as manufacturing, medicine, earth science, finance, and entomology generate massive amounts of temporal or spatio-temporal data. More specifically, information about moving objects, events, and atmospheric measurements that are geo-referenced may be derived from high-resolution satellites, sensors, ground and aerial imagery, GPS, and RFID. Such data present challenges to current approaches for mining time series data. For example, shape-based similarity measures used for classification and clustering consistently fail to produce satisfactory results for long sequences, or trajectories modeling objects that move in 2D or 3D space which may often exhibit similar motion patterns but differ in locations and orientations. In addition, algorithms for finding frequent patterns and anomalies assume known, fixed pattern lengths. This project aims to address the limitations of current approaches to time series data analysis by adapting statistical language processing algorithms and approaches. Specifically, fast algorithms for learning context-free grammars can expose hierarchical structure in time series and thus enable efficient discovery of variable length patterns and facilitate human understanding of time series structure. Also, using the hierarchy to populate a "bag of patterns" can result in significantly more effective similarity measures for long time series, much like the familiar bag of words representation used with documents is effective for a variety of similarity-based language processing tasks on massive corpora.

Given the ubiquitous nature of time series data, advances in algorithms that can help uncover the structure of such data are likely to impact a broad range of applications. All of the results of this research, including publications, algorithms and software, would be made freely available to the broader research and educational community. The project offers enhanced research-based training opportunities for graduate and undergraduate students. The project leverages existing programs at George Mason University and the University of Maryland at Baltimore County to to increase the participation of women members of other groups that are under-represented in Computer Science.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
1218318
Program Officer
Sylvia Spengler
Project Start
Project End
Budget Start
2012-09-01
Budget End
2016-08-31
Support Year
Fiscal Year
2012
Total Cost
$250,000
Indirect Cost
Name
University of Maryland Baltimore County
Department
Type
DUNS #
City
Baltimore
State
MD
Country
United States
Zip Code
21250