Mixed-Precision Neural Network Quantization via Learned Layer-wise Importance

03/16/2022
by   Chen Tang, et al.
0

The exponentially large discrete search space in mixed-precision quantization (MPQ) makes it hard to determine the optimal bit-width for each layer. Previous works usually resort to iterative search methods on the training set, which consume hundreds or even thousands of GPU-hours. In this study, we reveal that some unique learnable parameters in quantization, namely the scale factors in the quantizer, can serve as importance indicators of a layer, reflecting the contribution of that layer to the final accuracy at certain bit-widths. These importance indicators naturally perceive the numerical transformation during quantization-aware training, which can precisely and correctly provide quantization sensitivity metrics of layers. However, a deep network always contains hundreds of such indicators, and training them one by one would lead to an excessive time cost. To overcome this issue, we propose a joint training scheme that can obtain all indicators at once. It considerably speeds up the indicators training process by parallelizing the original sequential training processes. With these learned importance indicators, we formulate the MPQ search problem as a one-time integer linear programming (ILP) problem. That avoids the iterative search and significantly reduces search time without limiting the bit-width search space. For example, MPQ search on ResNet18 with our indicators takes only 0.06 seconds. Also, extensive experiments show our approach can achieve SOTA accuracy on ImageNet for far-ranging models with various constraints (e.g., BitOps, compress rate).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/16/2021

OMPQ: Orthogonal Mixed Precision Quantization

To bridge the ever increasing gap between deep neural networks' complexi...
research
12/20/2022

CSMPQ:Class Separability Based Mixed-Precision Quantization

Mixed-precision quantization has received increasing attention for its c...
research
05/11/2023

Patch-wise Mixed-Precision Quantization of Vision Transformer

As emerging hardware begins to support mixed bit-width arithmetic comput...
research
03/04/2021

Effective and Fast: A Novel Sequential Single Path Search for Mixed-Precision Quantization

Since model quantization helps to reduce the model size and computation ...
research
04/21/2022

Arbitrary Bit-width Network: A Joint Layer-Wise Quantization and Adaptive Inference Approach

Conventional model quantization methods use a fixed quantization scheme ...
research
07/20/2023

EMQ: Evolving Training-free Proxies for Automated Mixed Precision Quantization

Mixed-Precision Quantization (MQ) can achieve a competitive accuracy-com...
research
10/27/2022

Neural Networks with Quantization Constraints

Enabling low precision implementations of deep learning models, without ...

Please sign up or login with your details

Forgot password? Click here to reset