As critical applications are converging onto the Internet, effective response to large-scale network dynamics like failures and demand spikes is gaining more importance. Major portion of the time for handling such network dynamics is determining how to respond, mostly performed manually in the current practice. Experienced human administrators are typically the ones who can quickly find a close-to-optimum response. However, as the networks are getting larger and more diverse, managing and attaining effective responses for an online operational network necessitates meta-tools to swiftly learn and characterize the network.
This project will develop tools for automated management of a running network by framing heuristic optimization, empirical learning, experimental design, and network management with a simulation interface. The project will develop an online management and experimentation system for large-scale networks in an environment that enables trainee administrators to explore what-if scenarios, without having to risk the network operation. The project will also develop algorithms for empirical characterization of network dynamics, and tools for quick and close-to-optimal configuration of numerous network parameters in response to failures or customer traffic trends.
The project will integrate behavioral scientific concepts into the practice of operational network management. The automated management using online optimization may establish a foundation for managing multi-owner systems, e.g., power grid, transportation, and water infrastructure networks. The project's heuristic optimization and experiment design methods as well as the approach to operator training are applicable to training in safety and mission critical industries where mistakes of ill-trained administrators are intolerable, e.g., airline pilot and nuclear reactor administrator training.