Towards the Limit of Network Quantization

12/05/2016
by   Yoojin Choi, et al.
0

Network quantization is one of network compression techniques to reduce the redundancy of deep neural networks. It reduces the number of distinct network parameter values by quantization in order to save the storage for them. In this paper, we design network quantization schemes that minimize the performance loss due to quantization given a compression ratio constraint. We analyze the quantitative relation of quantization errors to the neural network loss function and identify that the Hessian-weighted distortion measure is locally the right objective function for the optimization of network quantization. As a result, Hessian-weighted k-means clustering is proposed for clustering network parameters to quantize. When optimal variable-length binary codes, e.g., Huffman codes, are employed for further compression, we derive that the network quantization problem can be related to the entropy-constrained scalar quantization (ECSQ) problem in information theory and consequently propose two solutions of ECSQ for network quantization, i.e., uniform quantization and an iterative solution similar to Lloyd's algorithm. Finally, using the simple uniform quantization followed by Huffman coding, we show from our experiments that the compression ratios of 51.25, 22.17 and 40.65 are achievable for LeNet, 32-layer ResNet and AlexNet, respectively.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/10/2022

Quantization in Layer's Input is Matter

In this paper, we will show that the quantization in layer's input is mo...
research
02/07/2018

Universal Deep Neural Network Compression

Compression of deep neural networks (DNNs) for memory- and computation-e...
research
07/12/2021

HEMP: High-order Entropy Minimization for neural network comPression

We formulate the entropy of a quantized artificial neural network as a d...
research
06/17/2020

Universally Quantized Neural Compression

A popular approach to learning encoders for lossy compression is to use ...
research
06/05/2016

Pairwise Quantization

We consider the task of lossy compression of high-dimensional vectors th...
research
04/11/2006

Concerning the differentiability of the energy function in vector quantization algorithms

The adaptation rule for Vector Quantization algorithms, and consequently...
research
05/15/2019

DeepCABAC: Context-adaptive binary arithmetic coding for deep neural network compression

We present DeepCABAC, a novel context-adaptive binary arithmetic coder f...

Please sign up or login with your details

Forgot password? Click here to reset