This project addresses computing clouds, large-scale shared infrastructures that offer practically unlimited hardware to most users and applications. In order to achieve scalable performance, all components of the system, from hardware to operating system, middleware, various servers, and the application itself, need to cooperate. Bottlenecks in components can slow down the entire system. In traditional computer systems (e.g., as modeled by queuing theory), a typical assumption is that their workloads consist of independent jobs. This assumption, which is valid for old-style batch-oriented processing and interactive users, guarantees the appearance of single bottlenecks for an entire system. Single bottlenecks can be relatively easily detected, since they appear as resources reaching saturation (e.g., 100% utilization).
The "independent jobs" model does not hold for the important class of web-facing applications (e.g., e-commerce) that rely on the popular n-tier architecture. N-tier systems divide the system into a pipeline of processing components, e.g., consisting of web servers, application servers, and database servers. While the n-tier architecture supports good performance scalability at the web server and application server tiers, it also introduces several (sometimes unexpected) strong dependencies among other tiers and components. These dependencies produce an interesting phenomenon called multi-bottleneck. Multi-bottlenecks are characterized by system throughput limited by a ceiling regardless of additional hardware, and no single resource shows average utilization anywhere near saturation. (Anecdotally, this is an increasingly common situation in practice.) Multi-bottlenecks are difficult to find, diagnose, and remove when using traditional performance evaluation methods. They are also important in clouds since they will be the only bottlenecks left after the removal of easily spotted single bottlenecks.
This project develops, evaluates, and refines a systematic search method, called Telescoping, to find multi-bottlenecks by running large scale experiments on production clouds. A simulator generates well-defined multi-bottlenecks to help refine the Telescoping search method and tune its parameters. Then, n-tier benchmarks such as RUBiS and RUBBoS (e-commerce applications) on production clouds such as Open Cirrus, Amazon EC2, and Emulab, gather experimental evidence on multi-bottlenecks. These experiments shed light on a little-known phenomenon in a rich, but unexplored area (performance limits of jobs with dependencies). Success can lead to significant new developments in the theoretical understanding of jobs with dependencies and improve practical uses of clouds by n-tier systems.
One of the most interesting and puzzling trends in cloud computing is the persistent low utilization of machines in data centers (Gartner reports average utilization of around 18% through late 2000â€™s and early 10â€™s). Managerially, this is a suboptimal situation since low utilization means low return on investment. Anecdotally, it has been known in the cloud community that the quality of service of mission-critical web-facing applications running in clouds and data centers can show significant perturbations (especially in the critical dimension of response time) even at moderate utilization levels, e.g., less than 50%. The main hypothesis of this project is that previously unconfirmed phenomena such as multi-bottlenecks are some of the causes that perturb the quality of service of cloud-based applications. The major outcomes from this project are a set of experiments that confirm this main hypothesis. Using the automated experiment management tools from the Elba project, a large number of n-tier application benchmark runs have been carried out in several data centers and clouds. These results confirm the existence and prevalence of transient bottlenecks, which only last a very short time (tens to hundreds of milliseconds) during the execution of n-tier benchmarks in data center environments. Through fine-grain monitoring (at a typical interval of 50ms), abundant experimental data show that such very short and transient bottlenecks can cause wide response time variations of up to several seconds for requests that usually only take a few milliseconds to return. Such long response time delays happen because of a causal chain of events in the n-tier system. The chain of events is triggered by a transient bottleneck that saturates a server CPU for tens to hundreds of milliseconds. The rapid job arrival rate of typical web-facing application workloads (thousands of requests per second), quickly overwhelms important internal software resources such as the thread pool in a server (e.g., web server or application server). The queue overflow propagates the waiting upstream, causing longer queues at the beginning of n-tier application. This queue propagation process eventually saturates the main soft resources in a key server (usually the web server), causing dropped network packets and packet retransmissions after 3 seconds. In the attached figure, a sample chain of events that link a transient bottleneck to long response times is illustrated by the case study on Java Virtual Machine garbage collector (JVM 1.5). On the left side, the top graph shows the occurrences of garbage collection episodes. The middle graph shows that the Tomcat application server CPU becomes saturated for a short time (between 100ms and 200ms), coinciding with garbage collection. The bottom graph shows that at the same time, Tomcat queues become full (magenta curve at the bottom), causing the web server queue (Apache, blue curve at the top) to overflow, in a process we call queue amplification. On the right side, the top graph repeats the Apache queues for reference, and the middle graph shows the increase in system response time at the same time. The bottom right graph shows the increase is due to a number of requests that have very long response time caused by dropped packets and retransmission. These chained events form a multi-bottleneck, since the queues in Tomcat and Apache are causally related. In this case, the highest average CPU utilization of all servers is only about 61%, since the processors are saturated only for a short time. At other times, the system is not busy at all, lowering the average CPU utilization. During the course of the project, several instances of transient bottlenecks at several system layers have been confirmed experimentally. The example in the attached figure (Java garbage collection) resides in the system software layer. A second instance of transient bottlenecks was found in connection with DVFS (Dynamic Voltage and Frequency Scaling) in modern processors, which is in the processor architecture layer. A third instance of multi-bottlenecks was found as part of consolidated applications in a virtualized environment, where very short bottlenecks arise due to occasional overlapping busy times in two virtual machines. This transient bottleneck appears in the application layer. These concrete examples show that transient bottlenecks (and multi-bottlenecks in general) are important problems that cause quality of service violations at relatively low average CPU utilization levels in cloud environments. In summary, this project was successful in confirming experimentally the main hypothesis that multi-bottlenecks (transient bottlenecks in particular) are a major cause of qualify of service perturbations in a cloud environment at moderate utilization levels. This is a significant finding since classic system performance analysis would not have predicted such large impact on response time, caused by very short bottlenecks. The fine time granularity of events (order of tens of milliseconds) also creates the need for new monitoring facilities to enable the discovery of other very short bottlenecks and multi-bottlenecks.