Auto-Precision Scaling for Distributed Deep Learning

11/20/2019
by   Ruobing Han, et al.
0

In recent years, large-batch optimization is becoming the key of distributed deep learning. However, large-batch optimization is hard. Straightforwardly porting the code often leads to a significant loss in testing accuracy. As some researchers suggested that large batch optimization leads to a low generalization performance, and they further conjectured that large-batch training needs a higher floating-point precision to achieve a higher generalization performance. To solve this problem, we conduct an open study in this paper. Our target is to find the number of bits that large-batch training needs. To do so, we need a system for customized precision study. However, state-of-the-art systems have some limitations that lower the efficiency of developers and researchers. To solve this problem, we design and implement our own system CPD: A High Performance System for Customized-Precision Distributed DL. In our experiments, our application often loses accuracy if we use a very-low precision (e.g. 8 bits or 4 bits). To solve this problem, we proposed the APS (Auto-Precision-Scaling) algorithm, which is a layer-wise adaptive scheme for gradients shifting. With APS, we are able to make the large-batch training converge with only 4 bits.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/24/2019

Large-Batch Training for LSTM and Beyond

Large-batch training approaches have enabled researchers to utilize larg...
research
04/01/2019

Reducing BERT Pre-Training Time from 3 Days to 76 Minutes

Large-batch training is key to speeding up deep neural network training ...
research
01/25/2018

Investigating the Effects of Dynamic Precision Scaling on Neural Network Training

Training neural networks is a time- and compute-intensive operation. Thi...
research
07/20/2022

Quantized Training of Gradient Boosting Decision Trees

Recent years have witnessed significant success in Gradient Boosting Dec...
research
06/15/2020

The Limit of the Batch Size

Large-batch training is an efficient approach for current distributed de...
research
10/28/2019

Adaptive Loss Scaling for Mixed Precision Training

Mixed precision training (MPT) is becoming a practical technique to impr...
research
12/15/2020

SPOC learner's final grade prediction based on a novel sampling batch normalization embedded neural network method

Recent years have witnessed the rapid growth of Small Private Online Cou...

Please sign up or login with your details

Forgot password? Click here to reset