The ability to embed computation on the I/O path creates exciting possibilities for accelerating a wide spectrum of data intensive applications, ranging from content-based retrieval of images stored on a personal computer to fusion of massive geospatial datasets. The presence of processing power at the storage devices, coupled with the location of these devices in the system, also opens up the opportunity to provide storage-centric computing services that run directly on the I/O path. However, harnessing a large amount of processing power on the I/O path at a small energy cost requires extensive architectural support. This research project develops a storage-centric architecture in which the entire I/O path is treated as a programmable and reconfigurable computational substrate. The research addresses several cross-cutting issues in electro-mechanical design, processor microarchitecture, and parallel programming. An integral component of this project includes the development of simulation tools and a hardware testbed for research and education in the area of adaptive active storage systems.

Project Report

We are in the era of data-centric computing. Many applications, such as big-data, high-performance computing (HPC), and social networking generate and process massive amounts of data. The design of the memory and storage system plays a pivotal role in determining the performance and energy usage characteristics for servers and data centers that run such applications. The goal of this NSF CAREER award was to explore the architecture design space of the storage system for such applications. This included extensions to conventional server and storage device architectures, exploring the role of emerging memory technologies, and developing tools that facilitate design-space exploration. The initial thrust of our research effort was to explore new architectures to reduce the cost of data movement between processors and the storage system. The first step was to explore "processing-in-storage" architectures, where data-intense computation is performed in or close to the data storage medium in a disk drive. We found that there is possible to increase the amount of processing power inside a disk drive and still operate within the power budget by trading off energy usage between the spindle and the electronics. To enable even higher performance, we next looked at disk drive architectures that provided parallelism inside the drive. The latter effort also showed promise as a technique to significantly reduce the energy usage of a storage system. We also developed techniques for effective energy reduction of conventional disk-based storage systems. As flash memory based Solid State Drives (SSDs) started becoming more common in the marketplace, the research emphasis shifted from hard drives to SSDs. While flash memory provides low access latency for reads, it still poses write bottlenecks. Moreover, flash memory has a limited lifetime and the memory cells can wear out after repeated writes and erases. We studied the reliability problem and showed that it is possible to achieve orders of magnitude higher endurance than those quoted in manufacturer datasheets by leveraging the self-recovery property inherent to flash. We showed that using a high-endurance non-volatile memory, such as Spin Transfer Torque RAM (STT-RAM), as a write merge-buffer in the SSD can significantly boost both performance and reliability. We also developed tools that allow detailed study of flash memory tradeoffs. We expanded our research from storage to examine non-volatility at other layers of the memory hierarchy. In particular, we looked at the use of STT-RAM as caches and main memory. While STT-RAM allows for high density and has virtually no leakage power (since data is not stored in the form of charges), it is slower than SRAM and the write-energy is high. We showed that it is possible to address these problems by reducing the retention time of the STT-RAM cells. Although reducing the non-volatility can be problematic for long-term storage, caches and main memory house data for much shorter duration of time than storage and therefore such a tradeoff will be acceptable. We also developed other circuit-level techniques to reduce write latency and energy. Finally, we developed a STT-RAM memory design tool. A key broader impact of this research was the development of a tutorial series on non-volatile memory with colleagues at IBM Research. These tutorials were given at major architecture conferences and at the Non-Volatile Memories Workshop in 2011, where there were over 140 registered attendees. We also co-authored a Synthesis Lecture on Computer Architecture book on this topic.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Communication Foundations (CCF)
Application #
0643925
Program Officer
Almadena Y. Chtchelkanova
Project Start
Project End
Budget Start
2007-04-01
Budget End
2013-03-31
Support Year
Fiscal Year
2006
Total Cost
$412,000
Indirect Cost
Name
University of Virginia
Department
Type
DUNS #
City
Charlottesville
State
VA
Country
United States
Zip Code
22904