Decentralization Meets Quantization

by   Hanlin Tang, et al.
ETH Zurich
University of Rochester

Optimizing distributed learning systems is an art of balancing between computation and communication. There have been two lines of research that try to deal with slower networks: quantization for low bandwidth networks, and decentralization for high latency networks. In this paper, we explore a natural question: can the combination of both decentralization and quantization lead to a system that is robust to both bandwidth and latency? Although the system implication of such combination is trivial, the underlying theoretical principle and algorithm design is challenging: simply quantizing data sent in a decentralized training algorithm would accumulate the error. In this paper, we develop a framework of quantized, decentralized training and propose two different strategies, which we call extrapolation compression and difference compression. We analyze both algorithms and prove both converge at the rate of O(1/√(nT)) where n is the number of workers and T is the number of iterations, matching the convergence rate for full precision, centralized training. We evaluate our algorithms on training deep learning models, and find that our proposed algorithm outperforms the best of merely decentralized and merely quantized algorithm significantly for networks with both high latency and low bandwidth.


page 1

page 2

page 3

page 4


Decentralized Deep Learning with Arbitrary Communication Compression

Decentralized training of deep learning models is a key element for enab...

DeepSqueeze: Decentralization Meets Error-Compensated Compression

Communication is a key bottleneck in distributed training. Recently, an ...

Communication-Efficient Decentralized Learning with Sparsification and Adaptive Peer Selection

Distributed learning techniques such as federated learning have enabled ...

DeepSqueeze: Parallel Stochastic Gradient Descent with Double-Pass Error-Compensated Compression

Communication is a key bottleneck in distributed training. Recently, an ...

Moniqua: Modulo Quantized Communication in Decentralized SGD

Running Stochastic Gradient Descent (SGD) in a decentralized fashion has...

Decentralized Parallel Algorithm for Training Generative Adversarial Nets

Generative Adversarial Networks (GANs) are powerful class of generative ...

PowerGossip: Practical Low-Rank Communication Compression in Decentralized Deep Learning

Lossy gradient compression has become a practical tool to overcome the c...

Please sign up or login with your details

Forgot password? Click here to reset