Hyperdrive: A Multi-Chip Systolically Scalable Binary-Weight CNN Inference Engine

03/05/2018
by   Renzo Andri, et al.
0

Deep neural networks have achieved impressive results in computer vision and machine learning. Unfortunately, state-of-the-art networks are extremely compute and memory intensive which makes them unsuitable for mW-devices such as IoT end-nodes. Aggressive quantization of these networks dramatically reduces the computation and memory footprint. Binary-weight neural networks (BWNs) follow this trend, pushing weight quantization to the limit. Hardware accelerators for BWNs presented up to now have focused on core efficiency, disregarding I/O bandwidth and system-level efficiency that are crucial for deployment of accelerators in ultra-low power devices. We present Hyperdrive: a BWN accelerator dramatically reducing the I/O bandwidth exploiting a novel binary-weight streaming approach, which can be used for arbitrarily sized convolutional neural network architecture and input resolution by exploiting the natural scalability of the compute units both at chip-level and system-level by arranging Hyperdrive chips systolically in a 2D mesh while processing the entire feature map together in parallel. Hyperdrive achieves 4.3 TOp/s/W system-level efficiency (i.e., including I/Os)---3.1x higher than state-of-the-art BWN accelerators, even if its core uses resource-intensive FP16 arithmetic for increased robustness.

READ FULL TEXT

page 2

page 3

page 5

page 6

page 8

page 11

page 12

page 13

research
03/05/2018

Hyperdrive: A Systolically Scalable Binary-Weight CNN Inference Engine for mW IoT End-Nodes

Deep neural networks have achieved impressive results in computer vision...
research
01/19/2018

Mobile Machine Learning Hardware at ARM: A Systems-on-Chip (SoC) Perspective

Machine learning is playing an increasingly significant role in emerging...
research
06/17/2016

YodaNN: An Architecture for Ultra-Low Power Binary-Weight CNN Acceleration

Convolutional neural networks (CNNs) have revolutionized the world of co...
research
03/19/2018

Local Binary Pattern Networks

Memory and computation efficient deep learning architec- tures are cruci...
research
08/30/2016

Low Complexity Multiply Accumulate Unit for Weight-Sharing Convolutional Neural Networks

Convolutional Neural Networks (CNNs) are one of the most successful deep...
research
09/26/2022

Going Further With Winograd Convolutions: Tap-Wise Quantization for Efficient Inference on 4x4 Tile

Most of today's computer vision pipelines are built around deep neural n...
research
04/16/2018

BinarEye: An Always-On Energy-Accuracy-Scalable Binary CNN Processor With All Memory On Chip in 28nm CMOS

This paper introduces BinarEye: a digital processor for always-on Binary...

Please sign up or login with your details

Forgot password? Click here to reset