Reconfigurable co-processor architecture with limited numerical precision to accelerate deep convolutional neural networks

08/21/2021
by   Sasindu Wijeratne, et al.
0

Convolutional Neural Networks (CNNs) are widely used in deep learning applications, e.g. visual systems, robotics etc. However, existing software solutions are not efficient. Therefore, many hardware accelerators have been proposed optimizing performance, power and resource utilization of the implementation. Amongst existing solutions, Field Programmable Gate Array (FPGA) based architecture provides better cost-energy-performance trade-offs as well as scalability and minimizing development time. In this paper, we present a model-independent reconfigurable co-processing architecture to accelerate CNNs. Our architecture consists of parallel Multiply and Accumulate (MAC) units with caching techniques and interconnection networks to exploit maximum data parallelism. In contrast to existing solutions, we introduce limited precision 32 bit Q-format fixed point quantization for arithmetic representations and operations. As a result, our architecture achieved significant reduction in resource utilization with competitive accuracy. Furthermore, we developed an assembly-type microinstructions to access the co-processing fabric to manage layer-wise parallelism, thereby making re-use of limited resources. Finally, we have tested our architecture up to 9x9 kernel size on Xilinx Virtex 7 FPGA, achieving a throughput of up to 226.2 GOp/S for 3x3 kernel size.

READ FULL TEXT
research
11/19/2019

AddNet: Deep Neural Networks Using FPGA-Optimized Multipliers

Low-precision arithmetic operations to accelerate deep-learning applicat...
research
03/22/2017

CNN-MERP: An FPGA-Based Memory-Efficient Reconfigurable Processor for Forward and Backward Propagation of Convolutional Neural Networks

Large-scale deep convolutional neural networks (CNNs) are widely used in...
research
03/21/2017

A Holistic Approach for Optimizing DSP Block Utilization of a CNN implementation on FPGA

Deep Neural Networks are becoming the de-facto standard models for image...
research
04/10/2019

An Application-Specific VLIW Processor with Vector Instruction Set for CNN Acceleration

In recent years, neural networks have surpassed classical algorithms in ...
research
03/21/2022

DSP-Packing: Squeezing Low-precision Arithmetic into FPGA DSP Blocks

The number of Digital Signal Processor (DSP) resources available in Fiel...
research
01/12/2017

Scaling Binarized Neural Networks on Reconfigurable Logic

Binarized neural networks (BNNs) are gaining interest in the deep learni...
research
09/05/2019

A Novel Design of Adaptive and Hierarchical Convolutional Neural Networks using Partial Reconfiguration on FPGA

Nowadays most research in visual recognition using Convolutional Neural ...

Please sign up or login with your details

Forgot password? Click here to reset