The goal of this research project is to design and develop an efficient, robust RFID stream processing system that addresses the challenges in emerging RFID deployments, including the data-information mismatch, incomplete and noisy data, and high data volume, and enables real-time tracking and monitoring. This project has two main contributions. The first contribution is a low-level interpretation and compression substrate over RFID streams. This substrate offers accurate interpretation of incomplete and noisy raw data; it infers locations of unobserved objects and inter-object relationships using probabilistic algorithms. To handle high data volume, it performs online interpretation, enabling online compression by identifying and discarding redundant data. The second contribution is higher-level complex event processing that addresses the data-information mismatch by encoding application information needs as event patterns and evaluating these patterns continuously over event streams. This project offers a foundation for complex event processing with a compact, expressive event language, theoretical underpinnings, automata-based mechanisms for efficient pattern evaluation over event streams, and techniques for robust processing over event streams that result from low-level interpretation and compression. This project integrates research and education through curriculum development and teaching and research lab development, and enables broader participation of women and minorities in research through college outreach and CRA?s distributed mentor program. This project will have broader impacts including release of source code, simulators, datasets, and benchmarks to the research community via the project's Web site (http://rfid-streams.cs.umass.edu/) and technology transfer with potential applications in supply chain management, healthcare, pharmaceuticals, library management, etc.

Project Report

This goal of this project was to develop an efficient, robust stream processing system that enables real-time object tracking and event monitoring on high-volume data streams, in particular, RFID-based object tracking and complex event processing. Research in this project led to the development of a stream processing system that incorporates data cleaning and inference, probabilistic query processing, complex event processing, and cluster computing. Our research significantly advanced the state of the art in stream processing, with our main contributions summarized in four related areas (hence, four subsystems): (1) SPIRE: data cleaning and inference. The SPIRE system translates raw, noisy sensor data streams into rich, queriable tuple streams that carry necessary attributes for query processing and characterize uncertainty of these attributes using continuous probability distributions. The system further performs such data transformation at stream speed, achieving orders of magnitude improvement in performance and scalability over existing data cleaning and inference techniques. (2) CLARO: probabilistic processing of relational queries on uncertain data streams. The CLARO system can efficiently capture the result uncertainty of evaluating relational queries on uncertain data streams that are modeled by continuous probability distributions, and return only those results of high confidence as specified by the user. The techniques employed in the system outperform best sampling methods in both accuracy and speed using data from RFID object tracking and computational astrophysics. These techniques are also shown to allow a tornado detection system to reduce the number of output errors by 2 orders of magnitude, while being able to process high-volume data at stream speed. (3) SASE: complex event processing (CEP) to detect temporal sequence patterns on uncertain data streams. CEP extends the traditional set-based query processing paradigm with a temporal model and temporal sequence patterns, for which our SASE+ language represents a good balance between expressive power and complexity: it is sufficient for expressing all complex event patterns that real applications require, and yet small enough to permit efficient implementation. Our optimizations allow complex pattern queries, with a worst-case exponential cost, to run with throughput of hundreds of thousands to millions of events per second, and outperform state-of-the-art techniques by up to two orders of magnitude. (4) SCALLA: scalable, low-latency processing using MapReduce. In this project, we developed a new Hadoop-based platform that enables low-latency, incremental query processing using a series of new hash techniques and optimizations. Evaluation results using both a dedicated research cluster and Amazon EC2 that showed that our platform allows the reduce progress to keep up with the map progress, hence achieving incremental processing. It further enabled up to 3 orders of magnitude reduction of internal data I/Os compared to stock Hadoop. The results of this project have broader scientific, social, and educational impacts. Evaluation results using real-world data sets, from domains including RFID object tracking, tornado detection, and computational astrophysics, provide direct evidence on the efficiency and effectiveness that the proposed techniques can offer to applications in those domains. The evaluation results obtained from a real tornado detection system indicate that the ability to distill useful data from noisy data can enable tornado detection in real time and with much improved efficiency, which may result in large social impact in future deployments. Results produced in this project have also obtained significant interest for adoption in new big data domains such as genomics. Besides research activities, this project also involved a number of education efforts, including an integrated undergraduate and graduate curriculum on data management and statistical analysis, and outreach and mentoring activities to engage women in research.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
0746939
Program Officer
Maria Zemankova
Project Start
Project End
Budget Start
2008-06-15
Budget End
2014-05-31
Support Year
Fiscal Year
2007
Total Cost
$604,879
Indirect Cost
Name
University of Massachusetts Amherst
Department
Type
DUNS #
City
Amherst
State
MA
Country
United States
Zip Code
01003