SWP: Microsecond Network SLOs Without Priorities

by   Kevin Zhao, et al.

The increasing use of cloud computing for latency-sensitive applications has sparked renewed interest in providing tight bounds on network tail latency. Achieving this in practice at reasonable network utilization has proved elusive, due to a combination of highly bursty application demand, faster link speeds, and heavy-tailed message sizes. While priority scheduling can be used to reduce tail latency for some traffic, this comes at a cost of much worse delay behavior for all other traffic on the network. Most operators choose to run their networks at very low average utilization, despite the added cost, and yet still suffer poor tail behavior. This paper takes a different approach. We build a system, swp, to help operators (and network designers) to understand and control tail latency without relying on priority scheduling. As network workload changes, swp is designed to give real-time advice on the network switch configurations needed to maintain tail latency objectives for each traffic class. The core of swp is an efficient model for simulating the combined effect of traffic characteristics, end-to-end congestion control, and switch scheduling on service-level objectives (SLOs), along with an optimizer that adjusts switch-level scheduling weights assigned to each class. Using simulation across a diverse set of workloads with different SLOs, we show that to meet the same SLOs as swp provides, FIFO would require 65 more for scenarios with tight SLOs on bursty traffic classes.


page 1

page 2

page 3

page 4


Backpressure Flow Control

Effective congestion control in a multi-tenant data center is becoming i...

A Case for Data Centre Traffic Management on Software Programmable Ethernet Switches

Virtualisation first and cloud computing later has led to a consolidatio...

Scalable Tail Latency Estimation for Data Center Networks

In this paper, we consider how to provide fast estimates of flow-level t...

Reducing Tail Latency via Safe and Simple Duplication

Duplication can be a powerful strategy for overcoming stragglers in clou...

Towards Fast, Adaptive, and Hardware-Assisted User-Space Scheduling

Modern datacenter applications are prone to high tail latencies since th...

COLA: Characterizing and Optimizing the Tail Latency for Safe Level-4 Autonomous Vehicle Systems

Autonomous vehicles (AVs) are envisioned to revolutionize our life by pr...

Site-to-Site Internet Traffic Control

Queues allow network operators to control traffic: where queues build, t...

Please sign up or login with your details

Forgot password? Click here to reset