Data-Free Quantization via Mixed-Precision Compensation without Fine-Tuning

07/02/2023
by   Jun Chen, et al.
0

Neural network quantization is a very promising solution in the field of model compression, but its resulting accuracy highly depends on a training/fine-tuning process and requires the original data. This not only brings heavy computation and time costs but also is not conducive to privacy and sensitive information protection. Therefore, a few recent works are starting to focus on data-free quantization. However, data-free quantization does not perform well while dealing with ultra-low precision quantization. Although researchers utilize generative methods of synthetic data to address this problem partially, data synthesis needs to take a lot of computation and time. In this paper, we propose a data-free mixed-precision compensation (DF-MPC) method to recover the performance of an ultra-low precision quantized model without any data and fine-tuning process. By assuming the quantized error caused by a low-precision quantized layer can be restored via the reconstruction of a high-precision quantized layer, we mathematically formulate the reconstruction loss between the pre-trained full-precision model and its layer-wise mixed-precision quantized model. Based on our formulation, we theoretically deduce the closed-form solution by minimizing the reconstruction loss of the feature maps. Since DF-MPC does not require any original/synthetic data, it is a more efficient method to approximate the full-precision model. Experimentally, our DF-MPC is able to achieve higher accuracy for an ultra-low precision quantized model compared to the recent methods without any data and fine-tuning process.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/14/2023

Unified Data-Free Compression: Pruning and Quantization without Fine-Tuning

Structured pruning and quantization are promising approaches for reducin...
research
03/29/2021

Zero-shot Adversarial Quantization

Model quantization is a promising approach to compress deep neural netwo...
research
02/14/2022

SQuant: On-the-Fly Data-Free Quantization via Diagonal Hessian Approximation

Quantization of deep neural networks (DNN) has been proven effective for...
research
12/21/2020

DAQ: Distribution-Aware Quantization for Deep Image Super-Resolution Networks

Quantizing deep convolutional neural networks for image super-resolution...
research
12/03/2019

The Knowledge Within: Methods for Data-Free Model Compression

Background: Recently, an extensive amount of research has been focused o...
research
10/12/2018

Quantization for Rapid Deployment of Deep Neural Networks

This paper aims at rapid deployment of the state-of-the-art deep neural ...
research
07/24/2023

A Model for Every User and Budget: Label-Free and Personalized Mixed-Precision Quantization

Recent advancement in Automatic Speech Recognition (ASR) has produced la...

Please sign up or login with your details

Forgot password? Click here to reset