CSMPQ:Class Separability Based Mixed-Precision Quantization

12/20/2022
by   Mingkai Wang, et al.
0

Mixed-precision quantization has received increasing attention for its capability of reducing the computational burden and speeding up the inference time. Existing methods usually focus on the sensitivity of different network layers, which requires a time-consuming search or training process. To this end, a novel mixed-precision quantization method, termed CSMPQ, is proposed. Specifically, the TF-IDF metric that is widely used in natural language processing (NLP) is introduced to measure the class separability of layer-wise feature maps. Furthermore, a linear programming problem is designed to derive the optimal bit configuration for each layer. Without any iterative process, the proposed CSMPQ achieves better compression trade-offs than the state-of-the-art quantization methods. Specifically, CSMPQ achieves 73.03% Top-1 acc on ResNet-18 with only 59G BOPs for QAT, and 71.30% top-1 acc with only 1.5Mb on MobileNetV2 for PTQ.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/11/2023

Patch-wise Mixed-Precision Quantization of Vision Transformer

As emerging hardware begins to support mixed bit-width arithmetic comput...
research
09/16/2021

OMPQ: Orthogonal Mixed Precision Quantization

To bridge the ever increasing gap between deep neural networks' complexi...
research
03/16/2022

Mixed-Precision Neural Network Quantization via Learned Layer-wise Importance

The exponentially large discrete search space in mixed-precision quantiz...
research
10/30/2021

RMSMP: A Novel Deep Neural Network Quantization Framework with Row-wise Mixed Schemes and Multiple Precisions

This work proposes a novel Deep Neural Network (DNN) quantization framew...
research
11/10/2019

HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks

Quantization is an effective method for reducing memory footprint and in...
research
10/16/2022

FIT: A Metric for Model Sensitivity

Model compression is vital to the deployment of deep learning on edge de...
research
10/27/2022

Neural Networks with Quantization Constraints

Enabling low precision implementations of deep learning models, without ...

Please sign up or login with your details

Forgot password? Click here to reset