DeepSqueeze: Parallel Stochastic Gradient Descent with Double-Pass Error-Compensated Compression

07/17/2019
by   Hanlin Tang, et al.
0

Communication is a key bottleneck in distributed training. Recently, an error-compensated compression technology was particularly designed for the centralized learning and receives huge successes, by showing significant advantages over state-of-the-art compression based methods in saving the communication cost. Since the decentralized training has been witnessed to be superior to the traditional centralized training in the communication restricted scenario, therefore a natural question to ask is "how to apply the error-compensated technology to the decentralized learning to further reduce the communication cost." However, a trivial extension of compression based centralized training algorithms does not exist for the decentralized scenario. key difference between centralized and decentralized training makes this extension extremely non-trivial. In this paper, we propose an elegant algorithmic design to employ error-compensated stochastic gradient descent for the decentralized scenario, named DeepSqueeze. Both the theoretical analysis and the empirical study are provided to show the proposed DeepSqueeze algorithm outperforms the existing compression based decentralized learning algorithms. To the best of our knowledge, this is the first time to apply the error-compensated compression to the decentralized learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/17/2019

DeepSqueeze: Decentralization Meets Error-Compensated Compression

Communication is a key bottleneck in distributed training. Recently, an ...
research
05/25/2017

Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent

Most distributed machine learning systems nowadays, including TensorFlow...
research
08/04/2021

ErrorCompensatedX: error compensation for variance reduced algorithms

Communication cost is one major bottleneck for the scalability for distr...
research
07/01/2020

Linear Convergent Decentralized Optimization with Compression

Communication compression has been extensively adopted to speed up large...
research
03/17/2018

Decentralization Meets Quantization

Optimizing distributed learning systems is an art of balancing between c...
research
11/17/2021

Low Precision Decentralized Distributed Training over IID and non-IID Data

Decentralized distributed learning is the key to enabling large-scale ma...
research
02/25/2020

Network-Density-Controlled Decentralized Parallel Stochastic Gradient Descent in Wireless Systems

This paper proposes a communication strategy for decentralized learning ...

Please sign up or login with your details

Forgot password? Click here to reset