A GPU-Outperforming FPGA Accelerator Architecture for Binary Convolutional Neural Networks

02/20/2017
by   Yixing Li, et al.
0

FPGA-based hardware accelerators for convolutional neural networks (CNNs) have obtained great attentions due to their higher energy efficiency than GPUs. However, it is challenging for FPGA-based solutions to achieve a higher throughput than GPU counterparts. In this paper, we demonstrate that FPGA acceleration can be a superior solution in terms of both throughput and energy efficiency when a CNN is trained with binary constraints on weights and activations. Specifically, we propose an optimized FPGA accelerator architecture tailored for bitwise convolution and normalization that features massive spatial parallelism with deep pipelines stages. A key advantage of the FPGA accelerator is that its performance is insensitive to data batch size, while the performance of GPU acceleration varies largely depending on the batch size of the data. Experiment results show that the proposed accelerator architecture for binary CNNs running on a Virtex-7 FPGA is 8.3x faster and 75x more energy-efficient than a Titan X GPU for processing online individual requests in small batch sizes. For processing static data in large batch sizes, the proposed solution is on a par with a Titan X GPU in terms of throughput while delivering 9.5x higher energy efficiency.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/09/2021

WinoCNN: Kernel Sharing Winograd Systolic Array for Efficient Convolutional Neural Network Acceleration on FPGAs

The combination of Winograd's algorithm and systolic array architecture ...
research
02/02/2021

Why is FPGA-GPU Heterogeneity the Best Option for Embedded Deep Neural Networks?

Graphics Processing Units (GPUs) are currently the dominating programmab...
research
08/26/2023

An Efficient FPGA-Based Accelerator for Swin Transformer

Since introduced, Swin Transformer has achieved remarkable results in th...
research
10/03/2018

Sparse Winograd Convolutional neural networks on small-scale systolic arrays

The reconfigurability, energy-efficiency, and massive parallelism on FPG...
research
09/22/2020

E-BATCH: Energy-Efficient and High-Throughput RNN Batching

Recurrent Neural Network (RNN) inference exhibits low hardware utilizati...
research
03/06/2019

Towards a Uniform Architecture for the Efficient Implementation of 2D and 3D Deconvolutional Neural Networks on FPGAs

Three-dimensional deconvolution is widely used in many computer vision a...
research
12/09/2022

Mining CryptoNight-Haven on the Varium C1100 Blockchain Accelerator Card

Cryptocurrency mining is an energy-intensive process that presents a pri...

Please sign up or login with your details

Forgot password? Click here to reset