n-hot: Efficient bit-level sparsity for powers-of-two neural network quantization

03/22/2021
by   Yuiko Sakuma, et al.
0

Powers-of-two (PoT) quantization reduces the number of bit operations of deep neural networks on resource-constrained hardware. However, PoT quantization triggers a severe accuracy drop because of its limited representation ability. Since DNN models have been applied for relatively complex tasks (e.g., classification for large datasets and object detection), improvement in accuracy for the PoT quantization method is required. Although some previous works attempt to improve the accuracy of PoT quantization, there is no work that balances accuracy and computation costs in a memory-efficient way. To address this problem, we propose an efficient PoT quantization scheme. Bit-level sparsity is introduced; weights (or activations) are rounded to values that can be calculated by n shift operations in multiplication. We also allow not only addition but also subtraction as each operation. Moreover, we use a two-stage fine-tuning algorithm to recover the accuracy drop that is triggered by introducing the bit-level sparsity. The experimental results on an object detection model (CenterNet, MobileNet-v2 backbone) on the COCO dataset show that our proposed method suppresses the accuracy drop by 0.3 while reducing the number of operations by about 75 compared to the uniform method.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/12/2021

Learnable Companding Quantization for Accurate Low-bit Neural Networks

Quantizing deep neural networks is an effective method for reducing memo...
research
03/24/2021

DNN Quantization with Attention

Low-bit quantization of network weights and activations can drastically ...
research
05/27/2020

Accelerating Neural Network Inference by Overflow Aware Quantization

The inherent heavy computation of deep neural networks prevents their wi...
research
03/02/2021

SME: ReRAM-based Sparse-Multiplication-Engine to Squeeze-Out Bit Sparsity of Neural Network

Resistive Random-Access-Memory (ReRAM) crossbar is a promising technique...
research
12/19/2018

Fast Adjustable Threshold For Uniform Neural Network Quantization

Neural network quantization procedure is the necessary step for porting ...
research
04/01/2023

Q-DETR: An Efficient Low-Bit Quantized Detection Transformer

The recent detection transformer (DETR) has advanced object detection, b...
research
07/15/2022

Low-bit Shift Network for End-to-End Spoken Language Understanding

Deep neural networks (DNN) have achieved impressive success in multiple ...

Please sign up or login with your details

Forgot password? Click here to reset