Gradient Coding with Dynamic Clustering for Straggler Mitigation

11/03/2020
by   Baturalp Buyukates, et al.
3

In distributed synchronous gradient descent (GD) the main performance bottleneck for the per-iteration completion time is the slowest straggling workers. To speed up GD iterations in the presence of stragglers, coded distributed computation techniques are implemented by assigning redundant computations to workers. In this paper, we propose a novel gradient coding (GC) scheme that utilizes dynamic clustering, denoted by GC-DC, to speed up the gradient calculation. Under time-correlated straggling behavior, GC-DC aims at regulating the number of straggling workers in each cluster based on the straggler behavior in the previous iteration. We numerically show that GC-DC provides significant improvements in the average completion time (of each iteration) with no increase in the communication load compared to the original GC scheme.

READ FULL TEXT
research
03/01/2021

Gradient Coding with Dynamic Clustering for Straggler-Tolerant Distributed Learning

Distributed implementations are crucial in speeding up large scale machi...
research
03/05/2019

Gradient Coding with Clustering and Multi-message Communication

Gradient descent (GD) methods are commonly employed in machine learning ...
research
05/16/2022

Two-Stage Coded Federated Edge Learning: A Dynamic Partial Gradient Coding Perspective

Federated edge learning (FEL) can training a global model from terminal ...
research
11/27/2021

DSAG: A mixed synchronous-asynchronous iterative method for straggler-resilient learning

We consider straggler-resilient learning. In many previous works, e.g., ...
research
12/06/2019

Communication-Efficient Network-Distributed Optimization with Differential-Coded Compressors

Network-distributed optimization has attracted significant attention in ...
research
06/02/2020

Age-Based Coded Computation for Bias Reduction in Distributed Learning

Coded computation can be used to speed up distributed learning in the pr...
research
08/07/2018

Speeding Up Distributed Gradient Descent by Utilizing Non-persistent Stragglers

Distributed gradient descent (DGD) is an efficient way of implementing g...

Please sign up or login with your details

Forgot password? Click here to reset