Kernel Quantization for Efficient Network Compression

03/11/2020
by   Zhongzhi Yu, et al.
0

This paper presents a novel network compression framework Kernel Quantization (KQ), targeting to efficiently convert any pre-trained full-precision convolutional neural network (CNN) model into a low-precision version without significant performance loss. Unlike existing methods struggling with weight bit-length, KQ has the potential in improving the compression ratio by considering the convolution kernel as the quantization unit. Inspired by the evolution from weight pruning to filter pruning, we propose to quantize in both kernel and weight level. Instead of representing each weight parameter with a low-bit index, we learn a kernel codebook and replace all kernels in the convolution layer with corresponding low-bit indexes. Thus, KQ can represent the weight tensor in the convolution layer with low-bit indexes and a kernel codebook with limited size, which enables KQ to achieve significant compression ratio. Then, we conduct a 6-bit parameter quantization on the kernel codebook to further reduce redundancy. Extensive experiments on the ImageNet classification task prove that KQ needs 1.05 and 1.62 bits on average in VGG and ResNet18, respectively, to represent each parameter in the convolution layer and achieves the state-of-the-art compression ratio with little accuracy loss.

READ FULL TEXT
research
02/10/2017

Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights

This paper presents incremental network quantization (INQ), a novel meth...
research
11/12/2020

Automated Model Compression by Jointly Applied Pruning and Quantization

In the traditional deep compression framework, iteratively performing ne...
research
10/22/2020

Tensor Reordering for CNN Compression

We show how parameter redundancy in Convolutional Neural Network (CNN) f...
research
03/14/2023

R^2: Range Regularization for Model Compression and Quantization

Model parameter regularization is a widely used technique to improve gen...
research
12/04/2019

Deep Model Compression via Deep Reinforcement Learning

Besides accuracy, the storage of convolutional neural networks (CNN) mod...
research
09/03/2022

SaleNet: A low-power end-to-end CNN accelerator for sustained attention level evaluation using EEG

This paper proposes SaleNet - an end-to-end convolutional neural network...
research
05/18/2020

Cross-filter compression for CNN inference acceleration

Convolution neural network demonstrates great capability for multiple ta...

Please sign up or login with your details

Forgot password? Click here to reset