Pareto-Optimal Quantized ResNet Is Mostly 4-bit

05/07/2021
by   Amirali Abdolrashidi, et al.
0

Quantization has become a popular technique to compress neural networks and reduce compute cost, but most prior work focuses on studying quantization without changing the network size. Many real-world applications of neural networks have compute cost and memory budgets, which can be traded off with model quality by changing the number of parameters. In this work, we use ResNet as a case study to systematically investigate the effects of quantization on inference compute cost-quality tradeoff curves. Our results suggest that for each bfloat16 ResNet model, there are quantized models with lower cost and higher accuracy; in other words, the bfloat16 compute cost-quality tradeoff curve is Pareto-dominated by the 4-bit and 8-bit curves, with models primarily quantized to 4-bit yielding the best Pareto curve. Furthermore, we achieve state-of-the-art results on ImageNet for 4-bit ResNet-50 with quantization-aware training, obtaining a top-1 eval accuracy of 77.09 demonstrate the regularizing effect of quantization by measuring the generalization gap. The quantization method we used is optimized for practicality: It requires little tuning and is designed with hardware capabilities in mind. Our work motivates further research into optimal numeric formats for quantization, as well as the development of machine learning accelerators supporting these formats. As part of this work, we contribute a quantization library written in JAX, which is open-sourced at https://github.com/google-research/google-research/tree/master/aqt.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/09/2020

Least squares binary quantization of neural networks

Quantizing weights and activations of deep neural networks results in si...
research
05/02/2017

Ternary Neural Networks with Fine-Grained Quantization

We propose a novel fine-grained quantization (FGQ) method to ternarize p...
research
12/14/2022

PD-Quant: Post-Training Quantization based on Prediction Difference Metric

As a neural network compression technique, post-training quantization (P...
research
04/01/2021

Training Multi-bit Quantized and Binarized Networks with A Learnable Symmetric Quantizer

Quantizing weights and activations of deep neural networks is essential ...
research
10/09/2020

Once Quantized for All: Progressively Searching for Quantized Efficient Models

Automatic search of Quantized Neural Networks has attracted a lot of att...
research
11/30/2021

PokeBNN: A Binary Pursuit of Lightweight Accuracy

Top-1 ImageNet optimization promotes enormous networks that may be impra...
research
09/11/2023

Understanding the Impact of Post-Training Quantization on Large Language Models

Large language models (LLMs) are rapidly increasing in size, with the nu...

Please sign up or login with your details

Forgot password? Click here to reset