The Internet consists of one of the most complex distributed software infrastructures in existence, made up of a vast intertwining of systems and protocols implemented over an enormous collection of routers and servers. Unfortunately, the extreme complexity of this software leads to a highly rich variety of hard-to-isolate failure modes and anomalies, turning the operation of a modern large-scale networked system into a constant process of finding and fixing problems. Research on debugging modern networked systems has thus far focused on "removing the human from the loop" by automatically detecting problems that violate predefined correctness conditions. However in practice, the enormous complexity of networks coupled with the fundamental need for domain-specific knowledge to localize problems limits the applicability of existing research, leaving debugging today a painstakingly manual process.

In this work, the researchers take a very different approach; namely, the position that manual labor is a necessary evil of debugging problems in networked systems, but that this process would be vastly simpler with in-network support for debugging. In this vein, the work develops techniques and tools for interactive debugging of wide area networked systems. Networked software presents new challenges for interactive debugging, including determining an ordering and global view of events in the face of high dynamism and extremely large scale, enabling the troubleshooter to interactively understand the underlying problems present in inherently massive and potentially incomplete sets of observations, localizing problems in the presence of competitors and adversaries that may subvert or limit information available to the debugger, and isolating the operational network's performance from the debugging process.

Intellectual Merit: This project will design the first interactive debugging system for modern networked systems. This work will make significant contributions to network architecture and protocol design, including: (a) a network-layer substrate that allows for tight controls on network execution, to provide reproducibility and performance isolation of the live network in highly distributed and dynamic environments; (b) extensions to support debugging in untrusted environments, to localize malicious behavior and to diagnose faults without requiring revelation of private inputs; (c) analytical models to fundamentally understand the level of diagnostics achievable in existing systems, and to redesign existing protocols for diagnosability; and (d) a characterization of faults in modern networks, and of human factors that slow the debugging process or harm diagnostic precision. This work will also produce new software and tools for interactive debugging of networked systems that will be made open-source.

Broader Impact: Network and service providers today spend billions of dollars hiring armies of highly skilled developers and troubleshooters. Being able to troubleshoot more efficiently reduces network downtime, improving reliability and cost-effectiveness of networking technologies. Networks that can be rapidly repaired after exceptions are an essential component of disaster survival and recovery for business and government communication systems. Simplifying network troubleshooting can also accelerate deployment of networks in underdeveloped regions lacking experienced technicians. Graduate students will benefit from industrial interactions via the researcher's six-year ongoing collaborations with AT&T Labs, and ongoing interactions with Yahoo! Labs, and Cisco Research. The work's research results regarding which components of systems and protocols are most prone to misunderstandings, leading to human error and debugging difficulty, will be applied to reduce misunderstandings in undergraduate networking classes.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Network Systems (CNS)
Application #
1053781
Program Officer
Darleen Fisher
Project Start
Project End
Budget Start
2011-02-01
Budget End
2017-01-31
Support Year
Fiscal Year
2010
Total Cost
$472,179
Indirect Cost
Name
University of Illinois Urbana-Champaign
Department
Type
DUNS #
City
Champaign
State
IL
Country
United States
Zip Code
61820