Efficient non-uniform quantizer for quantized neural network targeting reconfigurable hardware

11/27/2018
by   Natan Liss, et al.
0

Convolutional Neural Networks (CNN) has become more popular choice for various tasks such as computer vision, speech recognition and natural language processing. Thanks to their large computational capability and throughput, GPUs ,which are not power efficient and therefore does not suit low power systems such as mobile devices, are the most common platform for both training and inferencing tasks. Recent studies has shown that FPGAs can provide a good alternative to GPUs as a CNN accelerator, due to their re-configurable nature, low power and small latency. In order for FPGA-based accelerators outperform GPUs in inference task, both the parameters of the network and the activations must be quantized. While most works use uniform quantizers for both parameters and activations, it is not always the optimal one, and a non-uniform quantizer need to be considered. In this work we introduce a custom hardware-friendly approach to implement non-uniform quantizers. In addition, we use a single scale integer representation of both parameters and activations, for both training and inference. The combined method yields a hardware efficient non-uniform quantizer, fit for real-time applications. We have tested our method on CIFAR-10 and CIFAR-100 image classification datasets with ResNet-18 and VGG-like architectures, and saw little degradation in accuracy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/29/2018

NICE: Noise Injection and Clamping Estimation for Neural Network Quantization

Convolutional Neural Networks (CNN) are very popular in many fields incl...
research
04/29/2018

UNIQ: Uniform Noise Injection for non-uniform Quantization of neural networks

We present a novel method for training a neural network amenable to infe...
research
07/31/2017

Streaming Architecture for Large-Scale Quantized Neural Networks on an FPGA-Based Dataflow Platform

Deep neural networks (DNNs) are used by different applications that are ...
research
12/07/2020

BinArray: A Scalable Hardware Accelerator for Binary Approximated CNNs

Deep Convolutional Neural Networks (CNNs) have become state-of-the art f...
research
06/18/2020

Caffe Barista: Brewing Caffe with FPGAs in the Training Loop

As the complexity of deep learning (DL) models increases, their compute ...
research
05/08/2023

Flex-SFU: Accelerating DNN Activation Functions by Non-Uniform Piecewise Approximation

Modern DNN workloads increasingly rely on activation functions consistin...
research
03/12/2019

Low Power Inference for On-Device Visual Recognition with a Quantization-Friendly Solution

The IEEE Low-Power Image Recognition Challenge (LPIRC) is an annual comp...

Please sign up or login with your details

Forgot password? Click here to reset