Decoupled Greedy Learning of CNNs for Synchronous and Asynchronous Distributed Learning

06/11/2021
by   Eugene Belilovsky, et al.
0

A commonly cited inefficiency of neural network training using back-propagation is the update locking problem: each layer must wait for the signal to propagate through the full network before updating. Several alternatives that can alleviate this issue have been proposed. In this context, we consider a simple alternative based on minimal feedback, which we call Decoupled Greedy Learning (DGL). It is based on a classic greedy relaxation of the joint training objective, recently shown to be effective in the context of Convolutional Neural Networks (CNNs) on large-scale image classification. We consider an optimization of this objective that permits us to decouple the layer training, allowing for layers or modules in networks to be trained with a potentially linear parallelization. With the use of a replay buffer we show that this approach can be extended to asynchronous settings, where modules can operate and continue to update with possibly large communication delays. To address bandwidth and memory issues we propose an approach based on online vector quantization. This allows to drastically reduce the communication bandwidth between modules and required memory for replay buffers. We show theoretically and empirically that this approach converges and compare it to the sequential solvers. We demonstrate the effectiveness of DGL against alternative approaches on the CIFAR-10 dataset and on the large-scale ImageNet dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/23/2019

Decoupled Greedy Learning of CNNs

A commonly cited inefficiency of neural network training by back-propaga...
research
06/14/2023

A^2CiD^2: Accelerating Asynchronous Communication in Decentralized Deep Learning

Distributed training of Deep Learning models has been critical to many r...
research
12/29/2018

Greedy Layerwise Learning Can Scale to ImageNet

Shallow supervised 1-hidden layer neural networks have a number of favor...
research
07/01/2020

Convolutional Neural Network Training with Distributed K-FAC

Training neural networks with many processors can reduce time-to-solutio...
research
10/17/2018

A Bi-layered Parallel Training Architecture for Large-scale Convolutional Neural Networks

Benefitting from large-scale training datasets and the complex training ...
research
05/17/2019

Sequential training algorithm for neural networks

A sequential training method for large-scale feedforward neural networks...
research
12/03/2020

Accumulated Decoupled Learning: Mitigating Gradient Staleness in Inter-Layer Model Parallelization

Decoupled learning is a branch of model parallelism which parallelizes t...

Please sign up or login with your details

Forgot password? Click here to reset