A Statistical Framework for Low-bitwidth Training of Deep Neural Networks

10/27/2020
by   Jianfei Chen, et al.
0

Fully quantized training (FQT), which uses low-bitwidth hardware by quantizing the activations, weights, and gradients of a neural network model, is a promising approach to accelerate the training of deep neural networks. One major challenge with FQT is the lack of theoretical understanding, in particular of how gradient quantization impacts convergence properties. In this paper, we address this problem by presenting a statistical framework for analyzing FQT algorithms. We view the quantized gradient of FQT as a stochastic estimator of its full precision counterpart, a procedure known as quantization-aware training (QAT). We show that the FQT gradient is an unbiased estimator of the QAT gradient, and we discuss the impact of gradient quantization on its variance. Inspired by these theoretical results, we develop two novel gradient quantizers, and we show that these have smaller variance than the existing per-tensor quantizer. For training ResNet-50 on ImageNet, our 5-bit block Householder quantizer achieves only 0.5 relative to QAT, comparable to the existing INT8 baseline.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/19/2021

Logarithmic Unbiased Quantization: Practical 4-bit Training in Deep Learning

Quantization of the weights and activations is one of the main methods t...
research
04/29/2018

UNIQ: Uniform Noise Injection for the Quantization of Neural Networks

We present a novel method for training deep neural network amenable to i...
research
05/15/2021

On the Distributional Properties of Adaptive Gradients

Adaptive gradient methods have achieved remarkable success in training d...
research
08/16/2021

Distance-aware Quantization

We address the problem of network quantization, that is, reducing bit-wi...
research
09/10/2020

QuantNet: Learning to Quantize by Learning within Fully Differentiable Framework

Despite the achievements of recent binarization methods on reducing the ...
research
02/25/2020

Optimal Gradient Quantization Condition for Communication-Efficient Distributed Training

The communication of gradients is costly for training deep neural networ...
research
04/02/2021

Network Quantization with Element-wise Gradient Scaling

Network quantization aims at reducing bit-widths of weights and/or activ...

Please sign up or login with your details

Forgot password? Click here to reset