Computer communication networks are evolving from convenient communication mechanisms offering "best-effort" service into key elements of our national infrastructure that we depend on for both business and personal use. As such, the need for improved fault management techniques is increasing and the open problem of network fault management continues to grow in complexity. A major cause of the increase in complexity of this problem is the heterogeneity of networks. Today's networks are comprised of a variety of technologies (e.g. Wireless, ATM, Ethernet), that are required to support a diverse set of networked applications (WWW, electronic commerce, video conferencing). The fault management problem has been studied in varying degrees for different technologies, but there are many open issues and the research is unaware of any methods that can effectively be applied across technologies. Current methods are largely ad hoc and rely on the expertise of a human network manager. Even within single-technology networks (e.g. Ethernet LANs); strategies developed for one network cannot easily be generalized to other networks. In addition, the rate of change within a network makes it difficult for the network manager to maintain a high enough level of expertise to develop and maintain effective fault management strategies. New techniques that can be generalized and applied across technologies are needed to ensure network reliability in an efficient manner. The long-term goal of this research is to move toward self-managing networks by automating as much of network management as possible. Within network management, the project will first focus on fault management. Fault management encompasses a large, complex set of problems that have not been well studied or defined. Before methods for automation can be studied, the fault management problem must be better understood. The research proposes a set of studies aimed at discovering aspects of the problem that are common across many different types of networks. The proposed studies are primarily experimental. The research plans to use network fault insertion to study the impact of various faults on a network. Faults will be inserted into simulated networks, a network testbed, and a campus network. The experiments will provide insight into fault propagation and allow the researcher to study how various aspects of the fault management problem change from fault to fault, with network dynamics, and with different network types. Data collected from two operational networks (CS Department network and Motorola wireless networks) will be analyzed and used to validate simulation accuracy. A set of benchmarks for testing and comparing fault management results will be created and made publicly available. In the short-term, results can be used to develop fault management methods that (1) can be applied across different technologies, (2) can generalize from network to network, and (3) can adapt to changes in the network. Common traits discovered through experimentation can be used to formulate a theoretical foundation. This foundation coupled with the availability of benchmarks allows researchers to systematically decompose and demonstrate progress on widely applicable fault management problems. In the long-term, the results can help us understand how to design components for manageability.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Network Systems (CNS)
Type
Standard Grant (Standard)
Application #
9984811
Program Officer
Darleen L. Fisher
Project Start
Project End
Budget Start
2000-07-01
Budget End
2005-06-30
Support Year
Fiscal Year
1999
Total Cost
$429,991
Indirect Cost
Name
Illinois Institute of Technology
Department
Type
DUNS #
City
Chicago
State
IL
Country
United States
Zip Code
60616