Cloud computing has transformed how we approach computing. Its uses range from large-scale scientific computations, to e-commerce, to data-intensive applications, to web-hosting. This diversity is a measure of its success, but also represents one of the major challenges it faces. Cloud computing strength is through the sharing of computing resources, but sharing must be controlled. Imagine financial transactions or real-time medical imaging processing experiencing slow-downs because of, say, a high volume batch application. Current solutions control sharing at a very coarse level, e.g., the number of processor cores an application is entitled to, or how much memory or network bandwidth it can consume. This coarse control has been shown insufficient to offer strong response time guarantees. This project will design and implement solutions that address this short-coming, and in the process extend the cost and performance benefits of cloud computing to critical applications with real-time requirements. The project will provide a comprehensive understanding of the dependencies that exist among cloud computing resources (CPU, memory, network bandwidth, etc.), and devise scheduling and resource management solutions that account for those dependencies to provide fine grain control of the response time afforded to cloud users. A prototype implementation and a testbed will be used to demonstrate the applicability and feasibility of the solutions.
The project will explore the complex resource interactions that arise in virtualization technologies at the core of cloud computing solutions. The focus is on enforcing service guarantees for I/O operations in virtualized systems, as they play a critical role in delivering end-to-end computing guarantees. Specifically, the project will characterize the resource dependencies involved in I/O operations in a virtualized environment, and develop mechanisms to manage them effectively. Sample questions the project will investigate include the relationship between I/O performance and the number of CPU cores allocated to I/O operations, the overhead of assigning different threads to traffic from different VMs and traffic classes, the impact on performance of different interrupt handling strategies, etc. The implementation cost (overhead) of different scheduling and resource management mechanisms will also be quantified to enable informed trade-offs between efficiency and resource guarantees. A prototype implementation will be carried out in the Xen environment, and demonstrated over a local testbed consisting of eight 16-core computers with network connectivity ranging from 1Gbps to 40Gbps.
Broader Impacts: Cloud computing already plays a major role in our modern society, but it has the potential to grow even further in scale and scope (e.g., optimizing the efficiency of our electrical grid or controlling the flow of traffic to minimize congestion in urban settings). Realizing the cloud's full potential, however, requires that it evolves to offer predictable guarantees. This project will develop solutions towards realizing this goal, and in the process extend the benefits of cloud computing to new applications of vital importance to our society. Transfer of those solutions to commercially used cloud systems will be pursued through the delivery of modules aimed at the open-source Xen system (on which Amazon EC2 is based). The project will also target improving students' ability to leverage cloud computing. It will develop educational lab modules that give students direct hands-on experience with the complex interactions that arise in virtualized environments. The project will also be used to foster students' interest in cloud computing research by offering summer research opportunities to undergraduate students. Special emphasis will be given to recruiting women and under-represented minorities.