Efficient Training of Convolutional Neural Nets on Large Distributed Systems

11/02/2017
by   Sameer Kumar, et al.
0

Deep Neural Networks (DNNs) have achieved im- pressive accuracy in many application domains including im- age classification. Training of DNNs is an extremely compute- intensive process and is solved using variants of the stochastic gradient descent (SGD) algorithm. A lot of recent research has focussed on improving the performance of DNN training. In this paper, we present optimization techniques to improve the performance of the data parallel synchronous SGD algorithm using the Torch framework: (i) we maintain data in-memory to avoid file I/O overheads, (ii) we present a multi-color based MPI Allreduce algorithm to minimize communication overheads, and (iii) we propose optimizations to the Torch data parallel table framework that handles multi-threading. We evaluate the performance of our optimizations on a Power 8 Minsky cluster with 32 nodes and 128 NVidia Pascal P100 GPUs. With our optimizations, we are able to train 90 epochs of the ResNet-50 model on the Imagenet-1k dataset using 256 GPUs in just 48 minutes. This significantly improves on the previously best known performance of training 90 epochs of the ResNet-50 model on the same dataset using 256 GPUs in 65 minutes. To the best of our knowledge, this is the best known training performance demonstrated for the Imagenet- 1k dataset.

READ FULL TEXT
research
07/30/2018

Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes

Synchronized stochastic gradient descent (SGD) optimizers with data para...
research
11/12/2017

Extremely Large Minibatch SGD: Training ResNet-50 on ImageNet in 15 Minutes

We demonstrate that training ResNet-50 on ImageNet for 90 epochs can be ...
research
03/15/2018

GossipGraD: Scalable Deep Learning using Gossip Communication based Asynchronous Gradient Descent

In this paper, we present GossipGraD - a gossip communication protocol b...
research
03/16/2019

swCaffe: a Parallel Framework for Accelerating Deep Learning Applications on Sunway TaihuLight

This paper reports our efforts on swCaffe, a highly efficient parallel f...
research
10/28/2018

A Hitchhiker's Guide On Distributed Training of Deep Neural Networks

Deep learning has led to tremendous advancements in the field of Artific...
research
09/14/2017

ImageNet Training in Minutes

Finishing 90-epoch ImageNet-1k training with ResNet-50 on a NVIDIA M40 G...
research
11/12/2019

Throughput Prediction of Asynchronous SGD in TensorFlow

Modern machine learning frameworks can train neural networks using multi...

Please sign up or login with your details

Forgot password? Click here to reset