Many modern applications must contend with continuous, unbounded, possibly rapid and time-varying streams of data, in addition to or instead of the more traditional bulk-loaded, periodically updated, data sets. In the presence of data streams, evaluating queries or other analyses over the data becomes a long-running, continuous process, rather than the typical one-time queries and analyses supported by traditional database systems. More generally, event-based, reactive processing as a computing paradigm is becoming more and more prevalent, spanning a wide variety of application areas. Continuous queries over data streams are an intuitive and effective approach to data management within this paradigm. Data-intensive applications of this type include network traffic monitoring and engineering, analysis of sensor network data, monitoring call-detail records or charge-card transactions for fraud detection, online analysis of financial data, manufacturing monitoring and control, and others.

The original work focused on basic functionality of a streaming data prototype, but additional issues have become evident as applications have been deployed. These include query processing issues, such as approximate techniques to handle system overload, performance issues based on scalability problems and issues dealing with the distributed nature of the data processing. In addition, new work on algorithms for transparent querying of multiple web services and processing of streams of approximate or uncertain data will be explored. Because stream applications frequently maintain layers of derived data products, it is particularly important to capture the connections between derived and raw data to insure effective data maintenance and analysis.

The history and scope of the STREAM project, coupled with the current popularity of the research area in general, positions the proposed work to have wide impact on both research and education. The project provides Ph.D.-level research and education for numerous graduate students, as well as research experience for a number of undergraduates. The graduate students give presentations and demonstrations of their work at large forums- up to hundreds of people. Research papers from the first phase of the project are being studied in graduate classes at several institutions, and the PIs have been organizing biannual meetings that bring together several data streams research groups. The STREAM project initiated and curates a repository of data stream application specifications, and will continue to expand this repository as an important part of the proposed work. Finally, the software developed is made available for download, and a convenient web-accessible server is provided so that anyone can try out the system without the inconvenience of downloading and installing the software.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
0414762
Program Officer
Sylvia J. Spengler
Project Start
Project End
Budget Start
2005-09-15
Budget End
2010-08-31
Support Year
Fiscal Year
2004
Total Cost
$949,697
Indirect Cost
Name
Stanford University
Department
Type
DUNS #
City
Palo Alto
State
CA
Country
United States
Zip Code
94304