Communication Lower Bound in Convolution Accelerators

11/08/2019
by   Xiaoming Chen, et al.
0

In current convolutional neural network (CNN) accelerators, communication (i.e., memory access) dominates the energy consumption. This work provides comprehensive analysis and methodologies to minimize the communication for CNN accelerators. For the off-chip communication, we derive the theoretical lower bound for any convolutional layer and propose a dataflow to reach the lower bound. This fundamental problem has never been solved by prior studies. The on-chip communication is minimized based on an elaborate workload and storage mapping scheme. We in addition design a communication-optimal CNN accelerator architecture. Evaluations based on the 65nm technology demonstrate that the proposed architecture nearly reaches the theoretical minimum communication in a three-level memory hierarchy and it is computation dominant. The gap between the energy efficiency of our accelerator and the theoretical best value is only 37-87

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/04/2019

CapStore: Energy-Efficient Design and Management of the On-Chip Memory for CapsuleNet Inference Accelerators

Deep Neural Networks (DNNs) have been established as the state-of-the-ar...
research
12/03/2019

Understanding the Impact of On-chip Communication on DNN Accelerator Performance

Deep Neural Networks have flourished at an unprecedented pace in recent ...
research
08/09/2023

CiFHER: A Chiplet-Based FHE Accelerator with a Resizable Structure

Fully homomorphic encryption (FHE) is in the spotlight as a definitive s...
research
04/20/2021

CoDR: Computation and Data Reuse Aware CNN Accelerator

Computation and Data Reuse is critical for the resource-limited Convolut...
research
07/29/2020

Transaction-level Model Simulator for Communication-Limited Accelerators

Rapid design space exploration in early design stage is critical to algo...
research
05/03/2020

TIMELY: Pushing Data Movements and Interfaces in PIM Accelerators Towards Local and in Time Domain

Resistive-random-access-memory (ReRAM) based processing-in-memory (R^2PI...
research
02/04/2022

EcoFlow: Efficient Convolutional Dataflows for Low-Power Neural Network Accelerators

Dilated and transposed convolutions are widely used in modern convolutio...

Please sign up or login with your details

Forgot password? Click here to reset