The Digital Age has brought about unprecedented growth in the amount of data being generated, the number of data consumers, and the diversity of their interests and locations. Traditionally, users poll sources for information, but for many applications, polling is hardly scalable and may miss important events. The alternative offered by publish/subscribe systems is to push notifications to users with matching interests. This approach suits many applications, ranging from personal, commercial, medical, to environmental, military, and security. However, traditional publish/subscribe systems are becoming inadequate for advanced applications, where users want to receive information that has been filtered, joined, and summarized, and only when certain conditions are met.
This project aims at building a next-generation publish/subscribe system to face the new challenges. The PIs propose an end-to-end solution consisting of techniques from subscription processing and indexing to dissemination network design, which work together to support efficient and powerful subscription functionalities, allowing users to control precisely what they want and when they want it.
One main feature distinguishing the proposed approach from previous work is joint consideration of subscription processing and notification dissemination. Traditionally, these problems are considered separately by database and networking communities. However, there exists a wide spectrum of interesting alternatives for interfacing processing with dissemination. The PIs propose a promising approach that allows complex, stateful subscriptions to be handled by simple, stateless dissemination mechanisms, with a clean system design that is easy to implement and scale. A cost-based optimizer, inspired by database query optimization, chooses the best processing and dissemination strategies jointly and dynamically.
Besides system building, this project tackles many new algorithmic challenges, including, e.g., scalably processing a large number of complex subscriptions; exploiting event and subscription characteristics to combat worst-case complexity; balancing semantic similarity and network proximity in dissemination network design; and efficiently maintaining statistics for high-dimensional events and subscriptions.
Broader Impact:
Bringing together their expertise in databases and algorithms, the PIs have a track record of collaborating with each other and with researchers outside computer science. A planned application of the system is to help ecologists with environmental monitoring.
The PIs are committed to integrating research and education, and in particular undergraduate research, by following their tradition of involving undergraduates through REU and department fellowships. The PIs are collaborating on an effort supported by the Department of Education, to increase workforce diversity for women, minorities, and persons with disabilities. They also participate in an internship program for underrepresented groups, which provides minority students opportunities to perform paid summer research with Duke faculty members.
Project URL: www.cs.duke.edu/dbgroup/prosem/