CSQ: Growing Mixed-Precision Quantization Scheme with Bi-level Continuous Sparsification

12/06/2022
by   Lirui Xiao, et al.
0

Mixed-precision quantization has been widely applied on deep neural networks (DNNs) as it leads to significantly better efficiency-accuracy tradeoffs compared to uniform quantization. Meanwhile, determining the exact precision of each layer remains challenging. Previous attempts on bit-level regularization and pruning-based dynamic precision adjustment during training suffer from noisy gradients and unstable convergence. In this work, we propose Continuous Sparsification Quantization (CSQ), a bit-level training method to search for mixed-precision quantization schemes with improved stability. CSQ stabilizes the bit-level mixed-precision training process with a bi-level gradual continuous sparsification on both the bit values of the quantized weights and the bit selection in determining the quantization precision of each layer. The continuous sparsification scheme enables fully-differentiable training without gradient approximation while achieving an exact quantized model in the end.A budget-aware regularization of total model size enables the dynamic growth and pruning of each layer's precision towards a mixed-precision quantization scheme of the desired size. Extensive experiments show CSQ achieves better efficiency-accuracy tradeoff than previous methods on multiple models and datasets.

READ FULL TEXT
research
02/20/2021

BSQ: Exploring Bit-Level Sparsity for Mixed-Precision Neural Network Quantization

Mixed-precision quantization can potentially achieve the optimal tradeof...
research
07/04/2020

FracBits: Mixed Precision Quantization via Fractional Bit-Widths

Model quantization helps to reduce model size and latency of deep neural...
research
04/07/2023

AutoQNN: An End-to-End Framework for Automatically Quantizing Neural Networks

Exploring the expected quantizing scheme with suitable mixed-precision p...
research
08/03/2017

Learning Accurate Low-Bit Deep Neural Networks with Stochastic Quantization

Low-bit deep neural networks (DNNs) become critical for embedded applica...
research
01/12/2021

Activation Density based Mixed-Precision Quantization for Energy Efficient Neural Networks

As neural networks gain widespread adoption in embedded devices, there i...
research
05/25/2018

Scalable Methods for 8-bit Training of Neural Networks

Quantized Neural Networks (QNNs) are often used to improve network effic...
research
06/04/2021

Differentiable Dynamic Quantization with Mixed Precision and Adaptive Resolution

Model quantization is challenging due to many tedious hyper-parameters s...

Please sign up or login with your details

Forgot password? Click here to reset