CSR: Small: Breaking the Address Translation Barrier in Large Memory Systems

Rixner, Scott; Cox, Alan

Abstract

Today, inexpensive computer systems based on commodity, off-the-shelf components can support hundreds of gigabytes of memory. Traditionally, the demand for large-memory systems came predominantly from the operators of databases for applications such as high-volume transaction processing. Today, however, a much wider variety of applications drive the demand for such large-memory systems, ranging from server consolidation using virtualization to infrastructure for Web 2.0 applications. At a scale of 100GB or more, for many of these applications, virtual memory access becomes a bottleneck. Specifically, the overhead of address translation increases dramatically. Large pages can mitigate this problem by significantly increasing translation look-aside buffer (TLB) coverage. However, all too often these applications exhibit poor temporal and/or spatial locality of reference. Thus, even with large pages, the TLB hit rate is very low.

This research will develop novel architectural mechanisms and operating systems support to mitigate the cost of address translation. Effective approaches to this problem include caching internal levels of the page table in dedicated hardware, providing hardware support to exploit physically contiguous memory reservations within the operating system, and re-examining page table organizations for large address spaces. This research will explore all of these techniques and carefully consider the interactions between memory allocation and management in the operating system and address translation overhead in the hardware.

This research will transform the way in which address translation is performed on future systems, enabling the effective use of hundreds of gigabytes of memory. Currently, large memory machines suffer from address translation bottlenecks, limiting overall performance. As these machines consume significant amounts of power, this leads to poor power efficiency, wasting both energy and money. More efficient address translation will lead to significant improvements, especially in datacenters with numerous large memory machines.

Project Report

Today, inexpensive computer systems based on commodity, off-the-shelf components can support 100s of gigabytes of memory. Traditionally, the demand for large-memory systems came predominantly from the operators of databases for applications such as high-volume transaction processing. Now, however, applications that require such large-memory systems include a much wider variety of applications, ranging from server consolidation using virtualization to infrastructure for Web 2.0 applications. For many of these applications, at a scale of 100GB or more, access to data in virtual memory becomes a bottleneck. Specifically, the overhead of address translation increases dramatically. Large pages can mitigate this problem. In particular, they significantly increase TLB coverage. However, all too often these applications exhibit poor temporal and/or spatial locality of reference. For example, the memory access pattern of a database hash join is essentially random. Thus, even with large pages, the TLB hit rate can be very low. The overall objective of this project has been to explore approaches and mechanisms to improve performance and reduce energy in large-scale memory systems. Over the course of this project, we have explored a wide range of issues with modern memory systems spanning the stack of layers from hardware (TLBs, nested TLBs, and memory controllers) to system software (memory-based key-value stores and memory allocation). This holistic approach to memory system design has helped to illuminate where the performance bottlenecks are in modern large-scale memory systems. We have shown that address translation can have a large impact on system performance, both with and without virtualization. We have taken a fresh look at address translation and developed several innovative hardware mechanisms to better match and exploit the allocation policies of modern operating systems. Our innovative TLB architectures reduce address translation latencies without requiring any modifications to the system software. We have also shown that by modifying the system software, we can gain further benefits. We have developed page clustering policies that strike a balance between the address translation benefits of large pages and the costs of performing unnecessary, speculative I/O. This work benefits applications by reducing the latency of both code and data accesses without increasing the amount of I/O. We have also shown that replacement cost is a critical factor in the design of large-scale, memory-based, key-value stores. We have demonstrated that Web 2.0 applications can benefit from providing their cost of computing a value to the key-value store. We have also developed an innovative replacement policy that makes replacement decisions by integrating the locality of access to key-value pairs with the cost of later reinserting a key-value pair after its removal. This replacement policy has amortized constant time, making its overhead competitive with existing policies, while yielding much better replacement decisions. This work will not only improve the performance of existing key-value stores, but will hopefully drive further innovation in the area. As the demand for memory capacity continues to increase at an incredible rate, it is becoming critical to minimize address translation overheads. Currently, large memory machines suffer from address translation bottlenecks, limiting overall performance. As these machines consume significant amounts of power, this leads to poor power efficiency, wasting both energy and money. This research has advanced the state-of-the-art in address translation, lowering the overheads for future large-scale memory systems. More efficient address translation will lead to significant improvements, especially in data centers with numerous large memory machines.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Computer and Network Systems (CNS)
Type: Standard Grant (Standard)
Application #: 1018840
Program Officer: M. Mimi McClure

Project Start
Project End
Budget Start: 2010-09-01
Budget End: 2014-08-31
Support Year
Fiscal Year: 2010
Total Cost: $496,671
Indirect Cost

CSR: Small: Breaking the Address Translation Barrier in Large Memory Systems
Rixner, Scott Cox, Alan
Rice University, Houston, TX, United States

Abstract

Project Report

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Project Report

Funding Agency

Institution

Comments