Training with Quantization Noise for Extreme Fixed-Point Compression

04/15/2020
by   Angela Fan, et al.
3

We tackle the problem of producing compact models, maximizing their accuracy for a given model size. A standard solution is to train networks with Quantization Aware Training, where the weights are quantized during training and the gradients approximated with the Straight-Through Estimator. In this paper, we extend this approach to work with extreme compression methods where the approximations introduced by STE are severe. Our proposal is to only quantize a different random subset of weights during each forward, allowing for unbiased gradients to flow through the other weights. Controlling the amount of noise and its form allows for extreme compression rates while maintaining the performance of the original model. As a result we establish new state-of-the-art compromises between accuracy and model size both in natural language processing and image classification. For example, applying our method to state-of-the-art Transformer and ConvNet architectures, we can achieve 82.5 accuracy on MNLI by compressing RoBERTa to 14MB and 80.0 ImageNet by compressing an EfficientNet-B3 to 3.3MB.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/15/2020

Training with Quantization Noise for Extreme Model Compression

We tackle the problem of producing compact models, maximizing their accu...
research
03/10/2021

Quantization-Guided Training for Compact TinyML Models

We propose a Quantization Guided Training (QGT) method to guide DNN trai...
research
12/11/2022

Error-aware Quantization through Noise Tempering

Quantization has become a predominant approach for model compression, en...
research
10/29/2020

Permute, Quantize, and Fine-tune: Efficient Compression of Neural Networks

Compressing large neural networks is an important step for their deploym...
research
06/24/2023

Partitioning-Guided K-Means: Extreme Empty Cluster Resolution for Extreme Model Compression

Compactness in deep learning can be critical to a model's viability in l...
research
04/20/2021

Differentiable Model Compression via Pseudo Quantization Noise

We propose to add independent pseudo quantization noise to model paramet...
research
11/01/2018

Online Embedding Compression for Text Classification using Low Rank Matrix Factorization

Deep learning models have become state of the art for natural language p...

Please sign up or login with your details

Forgot password? Click here to reset