Data centers — the factories of the digital age — can consume as much power as a city of two million people, and in total consume two percent of the world’s electricity. Larger data centers can comprise over a million servers, each of which house CPUs, memory and storage. Memory in particular, can consume as high as 46% of average system energy, and even so, memory usage in today’s data centers can be as low as 20−30%. A key contributor to this problem is poor provisioning and utilization of memory across various data center applications. To address this problem, recent proposals have argued for memory disaggregation, which physically separates memory and CPUs into separate blades and connects them via the network. This approach not only promises better memory and CPU utilization, significantly improving data center energy efficiency, but also offers a number of additional benefits. Unfortunately, such a physical separation comes at a cost of performance for accessing memory efficiently, limiting its applicability. This project envisions a radically new design for memory disaggregation, which places memory management at emerging programmable elements in the network to enable high performance for disaggregated memory. If successful, this research will incentivize cloud providers to transition their data centers to disaggregated architectures, improving memory utilization, reducing energy consumption and consequently, total cost of ownership for their infrastructure. Planned outreach and curriculum development as a part of this project will broaden participation of underrepresented groups and educate high school, undergrad and graduate students on cloud systems and data center architectures.

Over the last few years, significant improvements in inter-server network performance, coupled with stagnating intra-server interconnect performance, have driven advances in data center resource disaggregation — where server compute, memory and storage resources are physically separated into network attached resource “blades”. However, actualizing the benefits of resource disaggregation, while ensuring application performance, requires operating system (OS) support. Unfortunately, existing proposals to this end expose a hard tradeoff between application performance on one hand and resource elasticity on the other. The driving vision of this project is a fundamentally new network-centric design for the disaggregated OS — one that places resource management and access functionality in the data center network fabric to break the above tradeoff. This proposal specifically focuses on in-network memory management for the envisioned OS, and will exploit recent advances in programmable network hardware to realize the memory subsystem design. The end-goal is a data center-scale shared memory abstraction, where each disaggregated core can efficiently access any memory word in the data center’s disaggregated memory pool. The research goals of the project are i) enable compute/memory elasticity and hardware flexibility via network-assisted shared memory; ii) facilitate performant access to network-attached memory via network-driven optimizations; and iii) ensure scalability and fault-tolerance for the memory subsystem for data center-wide disaggregation. The project also provides a multidisciplinary platform to realize educational objectives of (i) developing system and experimental components for our systems and networking curriculum, (ii) involving undergraduate students in publishable research, and (iii) promoting science and engineering in high-school students and underrepresented populations.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Network Systems (CNS)
Application #
2047220
Program Officer
Ann Von Lehmen
Project Start
Project End
Budget Start
2021-03-01
Budget End
2026-02-28
Support Year
Fiscal Year
2020
Total Cost
$117,313
Indirect Cost
Name
Yale University
Department
Type
DUNS #
City
New Haven
State
CT
Country
United States
Zip Code
06520