Randomized Local Fast Rerouting for Datacenter Networks with Almost Optimal Congestion

08/04/2021
by   Gregor Bankhamer, et al.
0

To ensure high availability, datacenter networks must rely on local fast rerouting mechanisms that allow routers to quickly react to link failures, in a fully decentralized manner. However, configuring these mechanisms to provide a high resilience against multiple failures while avoiding congestion along failover routes is algorithmically challenging, as the rerouting rules can only depend on local failure information and must be defined ahead of time. This paper presents a randomized local fast rerouting algorithm for Clos networks, the predominant datacenter topologies. Given a graph G=(V,E) describing a Clos topology, our algorithm defines local routing rules for each node v∈ V, which only depend on the packet's destination and are conditioned on the incident link failures. We prove that as long as number of failures at each node does not exceed a certain bound, our algorithm achieves an asymptotically minimal congestion up to polyloglog factors along failover paths. Our lower bounds are developed under some natural routing assumptions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/03/2020

Local Fast Rerouting with Low Congestion: A Randomized Approach

Most modern communication networks include fast rerouting mechanisms, im...
research
07/09/2023

Robust Routing Made Easy: Reinforcing Networks Against Non-Benign Faults

With the increasing scale of communication networks, the likelihood of f...
research
04/07/2022

On the Price of Locality in Static Fast Rerouting

Modern communication networks feature fully decentralized flow rerouting...
research
06/11/2020

On the Feasibility of Perfect Resilience with Local Fast Failover

In order to provide a high resilience and to react quickly to link failu...
research
11/29/2021

Shortcutting Fast Failover Routes in the Data Plane

In networks, availability is of paramount importance. As link failures a...
research
03/22/2023

Production Networks Resilience: Cascading Failures, Power Laws and Optimal Interventions

In this paper, we study the severity of cascading failures in supply cha...
research
08/21/2018

DeltaPath: dataflow-based high-performance incremental routing

Routing controllers must react quickly to failures, reconfigurations and...

Please sign up or login with your details

Forgot password? Click here to reset