To provide responsive services to end-users, cloud platforms are designed for low latency. Each layer of the stack, including local storage, distributed file systems, the communication layer, and coordination services, operates with a few milliseconds latency on average. However, a typical user request flows through dozens of services. As a result, the latency tail is as important to responsiveness as average latency: if 1 in 100 RPC requests experiences 50ms latency, then as the complexity of servicing an individual user request grows, so do does the likelihood that the latency of the user's request will be "dragged down" by an outlier.
We are exploring techniques for constructing data center services that provide low latency tails. Hard drives can introduce high latency when seeking, whereas SSDs provide uniform access times even for random workloads. Distributed storage systems like GFS or BigTable provide good average latency, but at scale, "rare" events like tablet splitting become common. We are designing techniques to reduce the latency impact of these rare events and that exploit redundancy to provide a fast path even if they occur. OSs and transport protocols are another source of latency noise; we are applying techniques from soft real-time systems and QoS-oriented networks to condition these layers as well. By bringing down tail latency, we will also make protocols that provide strong data consistency more practical.
As part of our work, we are incorporating these topics into the undergraduate and graduate curriculum, including a graduate course on operating systems.