Troubleshooting is an inherent part of network operation: no matter how well networks are designed, something eventually fails, and in large networks, failures are ever-present. In the past, troubleshooting has mostly relied on ad hoc techniques cobbled together as afterthoughts. However, both the importance and difficulty of troubleshooting has intensified as networks have become crucial, ubiquitous components of modern life, while at the same time their size and complexity continues to grow. These twin pressures highlight the urgent need to integrate troubleshooting as a first-class citizen when developing a network architecture.
Armed with a set of general principles, we are building a set of key building blocks for troubleshooting. The task of troubleshooting is one of vast range and eclectic trajectories. As such, the approaches we are developing are not limited to specific domains (e.g., web surfing performance, or routing connectivity failures) but are designed to be generally applicable across architectures, and evolvable as a network's structure and uses inevitably change. The goal is for our experiences to both inform broader communities, such as NSF's Future Internet Design (FIND) community, about ways to weave troubleshooting into new architectures, and in turn to take from these other architectural efforts both requirements and synergistic insights.
The health and failure of networks is of broad societal impact and improvements and breakthroughs in troubleshooting will have corresponding impact. Due to the wide reach of troubleshooting, privacy of information becomes a consideration, so we are developing privacy-aware principles throughout our work.