On the Benefits of Multiple Gossip Steps in Communication-Constrained Decentralized Optimization

11/20/2020
by   Abolfazl Hashemi, et al.
2

In decentralized optimization, it is common algorithmic practice to have nodes interleave (local) gradient descent iterations with gossip (i.e. averaging over the network) steps. Motivated by the training of large-scale machine learning models, it is also increasingly common to require that messages be lossy compressed versions of the local parameters. In this paper, we show that, in such compressed decentralized optimization settings, there are benefits to having multiple gossip steps between subsequent gradient iterations, even when the cost of doing so is appropriately accounted for e.g. by means of reducing the precision of compressed information. In particular, we show that having O(log1/ϵ) gradient iterations with constant step size - and O(log1/ϵ) gossip steps between every pair of these iterations - enables convergence to within ϵ of the optimal value for smooth non-convex objectives satisfying Polyak-Łojasiewicz condition. This result also holds for smooth strongly convex objectives. To our knowledge, this is the first work that derives convergence results for nonconvex optimization under arbitrary communication compression.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/20/2022

Adaptive Step-Size Methods for Compressed SGD

Compressed Stochastic Gradient Descent (SGD) algorithms have been recent...
research
10/31/2019

SPARQ-SGD: Event-Triggered and Compressed Communication in Decentralized Stochastic Optimization

In this paper, we propose and analyze SPARQ-SGD, which is an event-trigg...
research
08/10/2021

Decentralized Composite Optimization with Compression

Decentralized optimization and communication compression have exhibited ...
research
06/14/2019

Distributed Optimization for Over-Parameterized Learning

Distributed optimization often consists of two updating phases: local op...
research
01/03/2023

Decentralized Gradient Tracking with Local Steps

Gradient tracking (GT) is an algorithm designed for solving decentralize...
research
02/28/2017

Optimal algorithms for smooth and strongly convex distributed optimization in networks

In this paper, we determine the optimal convergence rates for strongly c...
research
08/21/2019

BRIDGE: Byzantine-resilient Decentralized Gradient Descent

Decentralized optimization techniques are increasingly being used to lea...

Please sign up or login with your details

Forgot password? Click here to reset