Accurate traffic measurement and monitoring is critical for network management, operation, and control. With the rapid growth of the Internet, network link speeds have become faster every year to accommodate more Internet users. Measuring and monitoring the traffic on such high-speed links has become a very challenging problem. Data streaming, first introduced in the database area, has been touted as a viable solution for this problem. Data streaming is concerned with processing a long stream of data items in one pass using a small working memory in order to answer a query regarding the stream. The challenge is to use this small memory to ``remember'' as much information pertinent to the query as possible. However, traditional data streaming algorithms, designed mostly for database applications, are not suitable for future network environments, where large volumes of data flowing through numerous high-speed links need to be monitored and controlled. This is due to the fact that these algorithms are typically designed for processing a single stream of data for a single specific type of query, and most of these algorithms cannot operate at very high link speed.

In this project, the principal investigator (PI) will investigate novel paradigms and mechanisms that allow us to perform large-scale distributed data streaming on tens of thousands of high-speed links and nodes, and aggregate, compress, and interpret these streaming results, for better measurement and monitoring of large networks. These paradigms and mechanisms will be designed to address the important network monitoring and measurement problems that traditional data streaming algorithms are not equipped to handle. Building on preliminary work, the PI plans to conduct research in the following two intellectual themes of data streaming, targeting the deficiency of existing data streaming algorithms. The first research theme is to design data streaming algorithms that can operate at the high link speeds of 40+ Gbps and still provide high accuracy, and to investigate how to achieve multiple data streaming goals in a much more resource-efficient way than simply combining individual streaming algorithms designed for those goals. The second research theme is to design distributed data streaming algorithms that can identify global patterns (e.g., globally frequent items) in aggregate traffic over many high-speed links, without merging the traffic into a single stream. These two research themes are closely related, constituting a holistic effort.

The successful exploration of the issues involved in network data streaming will have significant scientific and engineering impact. The results will provide us with much better technology for measurement, monitoring, and management of large high-speed networks, making the future Internet infrastructure more controllable, scalable, and robust. The results and methodologies developed in this project may help solve many other network data streaming problems and may have potential applications in other fields such as databases.

Broader Impact: This project will engage both graduate and undergraduate students, and offer them research and learning experience not only in computer networking, but also in other fields such as statistics and information theory. This will enhance their mathematical and problem solving skills, and make them more adaptive to the future challenges of networking and computing. This project will impact graduate and undergraduate curriculum through the introduction of a new course on principles of data streaming and its applications in networking and databases. This effort will help contribute to the formation of a strong relationship between research and education. The proposed research will strengthen the ongoing collaboration between the PI and the researchers at AT&T Labs--Research, IBM, and Telcordia Research Lab, facilitating application of scientific discoveries to the application domains. The results will be broadly disseminated through invited talks and tutorials at well-attended conferences, organization of focused workshops, and open-sourcing of data streaming software developed for this project, in addition to publication of papers in leading conferences and journals. The PI will also continue to work hard to actively engage underrepresented groups in research and education.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Network Systems (CNS)
Application #
0519745
Program Officer
Victor S. Frost
Project Start
Project End
Budget Start
2005-09-01
Budget End
2009-08-31
Support Year
Fiscal Year
2005
Total Cost
$286,000
Indirect Cost
Name
Georgia Tech Research Corporation
Department
Type
DUNS #
City
Atlanta
State
GA
Country
United States
Zip Code
30332