GPU Asynchronous Stochastic Gradient Descent to Speed Up Neural Network Training

12/21/2013
by   Thomas Paine, et al.
0

The ability to train large-scale neural networks has resulted in state-of-the-art performance in many areas of computer vision. These results have largely come from computational break throughs of two forms: model parallelism, e.g. GPU accelerated training, which has seen quick adoption in computer vision circles, and data parallelism, e.g. A-SGD, whose large scale has been used mostly in industry. We report early experiments with a system that makes use of both model parallelism and data parallelism, we call GPU A-SGD. We show using GPU A-SGD it is possible to speed up training of large convolutional neural networks useful for computer vision. We believe GPU A-SGD will make it possible to train larger networks on larger training sets in a reasonable amount of time.

READ FULL TEXT
research
11/06/2019

DC-S3GD: Delay-Compensated Stale-Synchronous SGD for Large-Scale Decentralized Neural Network Training

Data parallelism has become the de facto standard for training Deep Neur...
research
05/01/2023

Performance and Energy Consumption of Parallel Machine Learning Algorithms

Machine learning models have achieved remarkable success in various real...
research
04/09/2018

Large scale distributed neural network training through online distillation

Techniques such as ensembling and distillation promise model quality imp...
research
07/14/2021

Accelerating Distributed K-FAC with Smart Parallelism of Computing and Communication Tasks

Distributed training with synchronous stochastic gradient descent (SGD) ...
research
05/14/2019

Robust Neural Network Training using Periodic Sampling over Model Weights

Deep neural networks provide best-in-class performance for a number of c...
research
06/11/2018

Gear Training: A new way to implement high-performance model-parallel training

The training of Deep Neural Networks usually needs tremendous computing ...
research
04/20/2017

Hard Mixtures of Experts for Large Scale Weakly Supervised Vision

Training convolutional networks (CNN's) that fit on a single GPU with mi...

Please sign up or login with your details

Forgot password? Click here to reset