Declarative, database-style models for programming distributed applications are becoming widely adopted, in a variety of realms ranging from sensors to publish-subscribe to network state management. They free the developer to define high-level queries for the specific data of interest, without regard to details about data sources, communications protocols, or synchronization.

As this approach to programming gains momentum, there is increasing need to abstract low-level stream data source variations away under a uniform representation, i.e., a view; and to integrate, i.e., conjoin, different types of stream data from large numbers of sources. Such tasks involve much more distributed communication and coordination than in traditional distributed databases or even data stream management systems. It becomes essential to do in-network computation of the query, and to optimize the processing of each stream (or few streams) separately, in a way that considers the topology of the network.

This proposal develops the technologies to support integration of data streams, including languages for stream schema mappings, focusing on issues relating to combining distributed messages and maintaining timing information; techniques for rapidly establishing query computation paths through a network, for sets of data stream elements that need to be joined and aggregated together; offline and adaptive, network-aware query optimization techniques for distributed computation in the network. These techniques will scale across widely heterogeneous (sensor, wireless, and conventional) networks, and will be evaluated in environmental monitoring applications.

The intellectual merit is the development of new techniques for performing queries across large, highly distributed networks of stream-producing sources; this increases understanding of the adaptive query processing space when access costs to data items are non-uniform and query processing requires distributed communication, and the trade-offs with respect to offline versus adaptive optimization and relative to optimization granularity. The broader impact includes the development of distributed stream integration capabilities that can directly address a number of emerging and well-known challenges in the network and environmental monitoring domains. The educational component includes the training of two PhD students, and the teaching of stream data integration in graduate and advanced undergraduate courses.

Project URL: www.cis.upenn.edu/~zives/stream-integration/

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
0713267
Program Officer
Gia-Loi Le Gruenwald
Project Start
Project End
Budget Start
2007-09-15
Budget End
2011-08-31
Support Year
Fiscal Year
2007
Total Cost
$403,014
Indirect Cost
Name
University of Pennsylvania
Department
Type
DUNS #
City
Philadelphia
State
PA
Country
United States
Zip Code
19104