Data centers comprise an integral part of today's internet-based computing infrastructure upon which society relies. Applications ranging from e-commerce and web servers to grid computing for scientific research use the computation and storage provided by data centers. By consolidating resources, including hardware and system administration, data center customers can reduce expenses. However, customers expect a reasonable level of service from data centers, even with varying demand for the services these systems provide. To maximize service across all customers, a data center can provide differentiated service levels to various applications (customers) based on a contracted Service Level Agreement (SLA).
SLAs specify requirements in terms of agreed upon metrics (e.g., performance, availability, output bandwidth, server load), and they usually include several price points for different service levels. Unfortunately, with multithreaded system models (e.g., multiprocessors) simple extensions to conventional uniprocessor metrics can be misleading. The challenge is to develop metrics that bridge the gap between low-level hardware behavior and high-level metrics.
The proposed research addresses this need by exploring a design space that includes SLA metrics, system models, hardware-level metrics, and implementations. The project will develop hardware-level metrics by considering both the system model and the intended use of the metric. Once the metric is defined, various hardware implementations can be explored. A case study SLA performance metric and a corresponding hardware metric called Critical Instructions Per Cycle (CIPC) have been developed. Preliminary results with full-system simulation and commercial workloads reinforce the hypothesis that metrics must capture the behavior of low-level hardware.