Modern datacenters host online services that are decomposed into multiple software tiers spanning thousands of servers. Servers coordinate and communicate with each other using Remote Procedure Calls (RPCs) over the internal datacenter network. The ongoing productivity-boosting software architecture trend of microservices is pushing software decomposition of deployed services to the extreme, resulting in more frequent inter-server communication and finer-grained RPCs, often with runtimes of only a few microseconds. With shrinking per-RPC runtime, networking efficiency directly impacts the performance of an online service as a whole: networking-related overheads that would otherwise be negligible are amplified by the fine-grained nature of the application-level logic triggered per RPC. A promising approach to address this challenge is to promote the role of each server?s NIC?the gateway between a server?s compute resources and the network?from simple RPC delivery to active RPC manipulation. Historically, the NIC agnostically delivers incoming packets, by writing them into memory; the packets are later picked up by a CPU core for processing, resulting in excess data movement, inter-core synchronization overheads, or inter-core load imbalance. ROAr is a new server architecture optimized for efficient handling of microsecond-scale RPCs, featuring a NIC that dynamically monitors system-wide behavior and intelligently steers incoming RPCs within the server?s memory hierarchy, including direct placement in CPU cores? private caches. An RPC-oriented protocol allows the NIC to raise the level of abstraction it operates on from network packets to RPCs?i.e., from data chunks to messages that trigger some computation. The more information exposed to the NIC about the computation an RPC will trigger, the better the RPC steering decision the NIC can make. In the ROAr architecture, the NIC monitors incoming RPCs and makes a number of novel decisions to judiciously distribute them within a modern server?s memory hierarchy and across CPU cores. Overall, ROAr?s techniques can drastically improve the efficiency and performance of handling microsecond-scale RPCs. A direct consequence is improved quality for a plethora of large-scale online services deployed on modern datacenters, which make heavy use of such RPCs. Therefore, ROAr has the potential to influence the design of future server architectures.

ROAr involves extensive hardware-software co-design, breaking the conventionally rigid boundaries between network and compute. The NIC?s role is promoted from oblivious placement of incoming RPCs into a server?s memory hierarchy to active RPC acceleration. The NIC?s natural position in an RPC?s processing lifetime establishes it as an excellent agent to stage the cache hierarchy for optimized data movement and reduced latency. ROAr features three main mechanisms. First, it alleviates detrimental memory-bandwidth interference by keeping all incoming RPCs within the cache hierarchy, early-rejecting requests that are predicted to miss their deadline because of excessive ongoing queuing. Second, ROAr makes dynamic inter-core balancing decisions for incoming RPCs, by taking into account real-time system load information, and steers RPCs all the way to the private cache of the selected CPU core. Third, while an RPC is queued, waiting to be executed, the NIC prefetches the RPC?s corresponding instructions and critical data, thus accelerating the RPC?s startup time when it is eventually picked up by the core for processing. The nature of such prefetching is novel, as decisions are not based on predictions, but on prescience: the NIC?s early knowledge of an RPC?s arrival from the network. The proposed research involves theoretical modeling, simulation, and prototyping. Queuing analysis on a variety of RPC service-time distributions will be conducted to develop NIC-driven inter-core load distribution policies. A cycle-accurate simulation model of ROAr will be developed to evaluate in-cache network buffer management, RPC-to-core steering, and prefetching mechanisms. Finally, the applicability of NIC-driven load-balancing policies on existing architectures featuring discrete NICs will be evaluated, with the use of a programmable FPGA-based NIC.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Project Start
Project End
Budget Start
2020-10-01
Budget End
2023-09-30
Support Year
Fiscal Year
2020
Total Cost
$400,000
Indirect Cost
Name
Georgia Tech Research Corporation
Department
Type
DUNS #
City
Atlanta
State
GA
Country
United States
Zip Code
30332