Computing is now an essential tool and a catalyst for innovation for all aspects of human endeavor, including healthcare, education, science, commerce, government, and entertainment. An increasing amount of computing is performed on private and public clouds, primarily due to the cost and scalability benefits for both the end-users and operators of the warehouse-scale systems that host clouds. We have come to expect that these systems provide millions of users with instantaneous, personalized, and contextual access to petabytes of data. The goal of this project is to improve the capabilities and efficiency of warehouse-scale systems. Specifically, we aim to reconcile the presumed incompatibility between low-latency processing at massive scales and efficiency in terms of energy consumption and resource usage. We aim to improve energy and resource efficiency in warehouse-scale systems by factors of 2x-5x while allowing for low-latency processing at massive scales. Equally important, we aim to improve our understanding of the tradeoffs between scalability, low latency, and energy or resource efficiency in modern computing systems.

The project focuses on on-line, data-intensive workloads, such as search, social networking, real-time analytics, and machine learning analysis, that occupy thousands of servers in warehouse-scale systems and pose significant scaling challenges. Their strict latency constraints, large state requirements, and high communication fan-out makes it difficult to apply known techniques for power reduction and resource sharing across applications. Hence, they typically consume a significant percentage of peak power and use non-shared servers even during the frequent periods of medium or low user traffic. To a large extent, low latency and high efficiency are considered incompatible for these workloads. To bridge this gap, the project uses a cross-layer approach that monitors end-to-end workload performance and quality-of-service to guide system-wide power management and resource management. The first step is to develop a power management system that improves the energy proportionality of warehouse-scale systems during periods of low or medium load without compromising latency guarantees. The second step is to develop a system-wide resource management system that allows aggressive server sharing between latency-critical workloads and other workloads during periods of low or medium load without compromising latency guarantees. The third step is to design operating system policies for performance isolation between co-located workloads within a server. The final step is to use the insights from the previous steps to evaluate the efficacy of existing and proposed server architectures with respect to energy and resource efficiency for on-line, data-intensive workloads.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Network Systems (CNS)
Type
Standard Grant (Standard)
Application #
1422088
Program Officer
Marilyn McClure
Project Start
Project End
Budget Start
2014-08-01
Budget End
2018-07-31
Support Year
Fiscal Year
2014
Total Cost
$466,783
Indirect Cost
Name
Stanford University
Department
Type
DUNS #
City
Stanford
State
CA
Country
United States
Zip Code
94305