BiTAT: Neural Network Binarization with Task-dependent Aggregated Transformation

07/04/2022
by   Geon Park, et al.
8

Neural network quantization aims to transform high-precision weights and activations of a given neural network into low-precision weights/activations for reduced memory usage and computation, while preserving the performance of the original model. However, extreme quantization (1-bit weight/1-bit activations) of compactly-designed backbone architectures (e.g., MobileNets) often used for edge-device deployments results in severe performance degeneration. This paper proposes a novel Quantization-Aware Training (QAT) method that can effectively alleviate performance degeneration even with extreme quantization by focusing on the inter-weight dependencies, between the weights within each layer and across consecutive layers. To minimize the quantization impact of each weight on others, we perform an orthonormal transformation of the weights at each layer by training an input-dependent correlation matrix and importance vector, such that each weight is disentangled from the others. Then, we quantize the weights based on their importance to minimize the loss of the information from the original weights/activations. We further perform progressive layer-wise quantization from the bottom layer to the top, so that quantization at each layer reflects the quantized distributions of weights and activations at previous layers. We validate the effectiveness of our method on various benchmark datasets against strong neural quantization baselines, demonstrating that it alleviates the performance degeneration on ImageNet and successfully preserves the full-precision model performance on CIFAR-100 with compact backbone networks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/17/2018

Joint Training of Low-Precision Neural Network with Quantization Interval Parameters

Optimization for low-precision neural network is an important technique ...
research
10/16/2022

FIT: A Metric for Model Sensitivity

Model compression is vital to the deployment of deep learning on edge de...
research
11/30/2020

FactorizeNet: Progressive Depth Factorization for Efficient Network Architecture Exploration Under Quantization Constraints

Depth factorization and quantization have emerged as two of the principa...
research
11/30/2020

Where Should We Begin? A Low-Level Exploration of Weight Initialization Impact on Quantized Behaviour of Deep Neural Networks

With the proliferation of deep convolutional neural network (CNN) algori...
research
12/15/2020

Exploring Neural Networks Quantization via Layer-Wise Quantization Analysis

Quantization is an essential step in the efficient deployment of deep le...
research
05/29/2023

A Rainbow in Deep Network Black Boxes

We introduce rainbow networks as a probabilistic model of trained deep n...
research
07/03/2021

Exact Backpropagation in Binary Weighted Networks with Group Weight Transformations

Quantization based model compression serves as high performing and fast ...

Please sign up or login with your details

Forgot password? Click here to reset