Towards Mixed-Precision Quantization of Neural Networks via Constrained Optimization

10/13/2021
by   Weihan Chen, et al.
0

Quantization is a widely used technique to compress and accelerate deep neural networks. However, conventional quantization methods use the same bit-width for all (or most of) the layers, which often suffer significant accuracy degradation in the ultra-low precision regime and ignore the fact that emergent hardware accelerators begin to support mixed-precision computation. Consequently, we present a novel and principled framework to solve the mixed-precision quantization problem in this paper. Briefly speaking, we first formulate the mixed-precision quantization as a discrete constrained optimization problem. Then, to make the optimization tractable, we approximate the objective function with second-order Taylor expansion and propose an efficient approach to compute its Hessian matrix. Finally, based on the above simplification, we show that the original problem can be reformulated as a Multiple-Choice Knapsack Problem (MCKP) and propose a greedy search algorithm to solve it efficiently. Compared with existing mixed-precision quantization works, our method is derived in a principled way and much more computationally efficient. Moreover, extensive experiments conducted on the ImageNet dataset and various kinds of network architectures also demonstrate its superiority over existing uniform and mixed-precision quantization approaches.

READ FULL TEXT
research
03/17/2020

Efficient Bitwidth Search for Practical Mixed Precision Neural Network

Network quantization has rapidly become one of the most widely used meth...
research
10/27/2022

Neural Networks with Quantization Constraints

Enabling low precision implementations of deep learning models, without ...
research
05/11/2023

Patch-wise Mixed-Precision Quantization of Vision Transformer

As emerging hardware begins to support mixed bit-width arithmetic comput...
research
07/11/2023

Mixed-Precision Quantization with Cross-Layer Dependencies

Quantization is commonly used to compress and accelerate deep neural net...
research
08/11/2022

Mixed-Precision Neural Networks: A Survey

Mixed-precision Deep Neural Networks achieve the energy efficiency and t...
research
06/04/2021

Differentiable Dynamic Quantization with Mixed Precision and Adaptive Resolution

Model quantization is challenging due to many tedious hyper-parameters s...
research
06/14/2021

Neuroevolution-Enhanced Multi-Objective Optimization for Mixed-Precision Quantization

Mixed-precision quantization is a powerful tool to enable memory and com...

Please sign up or login with your details

Forgot password? Click here to reset