Traditional datacenters are built using servers, each of which tightly integrates a small amount of CPU (central processing unit), memory and storage onto a single motherboard. However, the end of Dennard's scaling and the slowdown of Moore's Law has led to surfacing of several fundamental limitations of such server-centric architectures (e.g., the memory-capacity wall making CPU-memory co-location unsustainable). Consequently, a new computing paradigm is emerging -- a disaggregated architecture, where each resource type is built as a standalone 'blade' and a network fabric interconnects the resource blades within and across racks. The computer architecture community has established a number of benefits of such disaggregated architectures, including the potential to have 10-100x larger resource capacity. While beneficial from the computer architecture perspective, disaggregated architectures alter several assumptions that once guided the design and optimization of existing networks, systems and applications (e.g., CPU-storage colocation, high CPU-memory bandwidth, storage hierarchy, data locality, failure models, etc.). Capitalizing on the benefits of disaggregated architectures will thus require re-architecting legacy systems and networks. This project aims to co-design the network, storage and compute fabrics for disaggregated datacenters.

On the network front, the project will design ultra-low latency intra-rack and inter-rack fabrics including a new network software stack that incorporates efficient congestion control, failure tolerance and scheduling mechanisms. The co-design of network and storage fabrics will lead to new (distributed) memory and storage management stacks for disaggregated storage, and a resource manager that provides essential isolation, sharing and elasticity guarantees across multiple applications sharing disaggregated storage and network fabrics. Finally, the project will build new distributed programming frameworks and re-architect existing applications to efficiently and correctly operate on disaggregated architectures.

This project will provide solutions to some of the most difficult and important technical questions surrounding this emerging computing paradigm and will have broad community impact primarily through educational and outreach activities, and technology transfer. Software artifacts resulting from this project will be publicly released to ensure repeatability and to foster follow up research. The project also has a substantial educational component including new courses and public release of teaching materials. Finally, the project will provide the necessary thrust to build an inter-disciplinary research community via mentoring of graduate students and postdoctoral scholars, yearly workshops and industry retreats to bridge the gap between industrial development and academic research.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Network Systems (CNS)
Application #
1704742
Program Officer
Darleen Fisher
Project Start
Project End
Budget Start
2017-09-15
Budget End
2022-08-31
Support Year
Fiscal Year
2017
Total Cost
$1,742,784
Indirect Cost
Name
Cornell University
Department
Type
DUNS #
City
Ithaca
State
NY
Country
United States
Zip Code
14850