This research concerns how best to manage contention for data center network resources. Data centers are among the fastest growing segment of the computer industry, and networks connect the computers in a data center to allow them to communicate. Just as roads can become congested when too many people try to use them at the same time, data center networks can become congested when too many applications try to send data at the same time. Most networks today use an end to end control mechanism - as the network becomes congested, it sends signals back to the computers to slow down. It might seem that faster networks would help, but the opposite is true - the amount of network communication is also rapidly increasing, and more data can be sent before the feedback mechanism can kick in to control traffic. This project (a collaborative project between investigators at the University of Washington and Massachusetts Institute of Technology) is to explore a different approach, where feedback occurs within the network, hop-by-hop between network switches, and just for those applications that are sending too fast.

The challenges for congestion control for data centers include rapidly increasing workload demand, ever faster links, small average transfer sizes, extremely bursty traffic, and limited switch buffer capacity. Existing end-to-end congestion control systems are far from optimal in these settings, and this is particularly noticeable for latency-sensitive applications. Many data center operators compensate by using priorities and/or running their networks at very low average utilization, but this raises costs without fully solving the problem. This research attempts to understand the benefits and limits of an alternative approach to congestion control for data center networks, based on per-hop flow control. The research will (i) develop a theoretical framework to quantify the difference between the two different approaches, (ii) demonstrate a practical implementation on modern programmable data center network switches, and (iii) understand and develop solutions for the engineering challenges of using per-hop flow control in data centers.

If successful, the research will help enable an emerging class of latency-sensitive applications to be deployed within and across data centers and at lower cost, for bursty traffic patterns and emerging very high bandwidth networks being developed in industry. Data center network technologies are rapidly evolving, and so a key aspect of this research is to develop materials to help train undergraduate and graduate students for the challenges that latency-sensitive applications pose for data center networks.

The project website, www.cs.washington.edu/homes/tom/backpressure/, contains copies of all project papers, presentations, source code, simulations, experimental results, and teaching materials. Additional material will be placed there as the project progresses, and will be maintained for a minimum of ten years after the completion of the project.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Network Systems (CNS)
Type
Standard Grant (Standard)
Application #
2006827
Program Officer
Deepankar Medhi
Project Start
Project End
Budget Start
2020-10-01
Budget End
2022-09-30
Support Year
Fiscal Year
2020
Total Cost
$250,000
Indirect Cost
Name
Massachusetts Institute of Technology
Department
Type
DUNS #
City
Cambridge
State
MA
Country
United States
Zip Code
02139