A plausible explanation for the persistent low utilization of data centers (around 18% by Gartner reports) is the managerial need to maintain quality of service against the well-known Latency Long Tail problem, where some apparently random requests that normally return within milliseconds would suddenly take multiple seconds. The latency long tail problem arises at moderate utilization levels (e.g., 50%) with all resources far from saturation. Despite the efforts to remedy the latency long tail problem in various ways, its causes have remained elusive: In most cases, the very requests that took several seconds actually return within milliseconds when executed by themselves. Studying and solving the latency long tail problem will contribute to better utilization while maintaining quality of service, leading to lower costs for cloud users, higher return on investment for cloud providers, and lower power consumption for the environment. The main goal of this project is the investigation of the class of very short bottlenecks, in which the CPU becomes saturated only for a small fraction of a second, as a significant cause of latency long tail problems. Despite their short lifespan, very short bottlenecks can lead to significant response time increases (several seconds) by propagating queuing effects up and down the request chain in an n-tier application system because of strong dependencies among the tiers during request processing.
This project runs large scale experiments in clouds and simulators to generate extensive fine-grain monitoring data in the investigation of very short bottlenecks, which are virtually invisible under typical performance monitoring tools with sampling periods of seconds or minutes. To match the time scale of very short bottlenecks, special instrumentation software tools are being refined to sample intra-server resource utilization at millisecond resolution and timestamp inter-server messages at microsecond resolution. Preliminary studies of n-tier application benchmarks with naturally bursty workloads have found very short bottlenecks that cause latency long tail in several system layers: systems software (JVM garbage collection), processor architecture (dynamic voltage and frequency scaling), and consolidation of applications in virtualized cloud environments. They show the potential for many other sources of very short bottlenecks, e.g., kernel daemon processes that use 100% of CPU for several milliseconds. Through careful distributed event analysis of the experimental data, new kinds of very short bottlenecks can be discovered, verified, reproduced, and studied in detail. Concrete solutions for specific very short bottlenecks have been developed, e.g., an improved Java garbage collector. However, other very short bottlenecks have no specific bug-fixes, e.g., those created by consolidated workload overlapping bursts of statistical nature. As an alternative to bug-fixes, more general solutions that disrupt queuing propagation are being explored. As a concrete example, instead of using a classic request/response approach, where waiting threads participate in the queuing propagation, asynchronous requests with notification of responses to reduce overall queuing is being investigated as a potential solution to eliminate or reduce the impact of several kinds of very short bottlenecks.