This research develops models, algorithmic methods and software solutions for tracking of massive data streams for monitoring applications such as IP (Internet Protocol) network traffic analysis. Such monitoring applications are inherently distributed, relying on correlating multiple streams, and therefore present challenges in terms of severe communication constraints. In addition, such massive streams are also faced with the traditional storage and per-item processing time constraints even in the centralized or the single stream cases. Under such severe constraints, monitoring is necessarily approximate. This research project develops principled methods for performing essential monitoring tasks on distributed streams under the accumulation of all such constraints. In particular, new methods are developed that trade off accuracy of analysis for meeting communication, space and time constraints.
Distributed monitoring of massive data streams arises in many communication systems, primarily in security applications. The resulting models and solutions address such applications and yield better understanding of how to perform detailed data analyses within existing resource constraints. This research is carried out in collaboration with industry researchers (Minos Garofalakis and Rajeev Rastogi of Lucent) who bring extensive knowledge of stream data mining and provide data sets for testing the new algorithms for approximate distributed stream tracking. In addition, the industrial participation in this project increases the impact of this project via technology transfer. Studying the algorithmic, database and networking aspects of the problem jointly will lead to significant new insights and training. Solutions and resulting software programs will be made freely available via the project's Web site (www.cs.rutgers.edu/~muthu/adst.html).