NEURAghe: Exploiting CPU-FPGA Synergies for Efficient and Flexible CNN Inference Acceleration on Zynq SoCs

12/04/2017
by   Paolo Meloni, et al.
0

Deep convolutional neural networks (CNNs) obtain outstanding results in tasks that require human-level understanding of data, like image or speech recognition. However, their computational load is significant, motivating the development of CNN-specialized accelerators. This work presents NEURAghe, a flexible and efficient hardware/software solution for the acceleration of CNNs on Zynq SoCs. NEURAghe leverages the synergistic usage of Zynq ARM cores and of a powerful and flexible Convolution-Specific Processor deployed on the reconfigurable logic. The Convolution-Specific Processor embeds both a convolution engine and a programmable soft core, releasing the ARM processors from most of the supervision duties and allowing the accelerator to be controlled by software at an ultra-fine granularity. This methodology opens the way for cooperative heterogeneous computing: while the accelerator takes care of the bulk of the CNN workload, the ARM cores can seamlessly execute hard-to-accelerate parts of the computational graph, taking advantage of the NEON vector engines to further speed up computation. Through the companion NeuDNN SW stack, NEURAghe supports end-to-end CNN-based classification with a peak performance of 169 Gops/s, and an energy efficiency of 17 Gops/W. Thanks to our heterogeneous computing model, our platform improves upon the state-of-the-art, achieving a frame rate of 5.5 fps on the end-to-end execution of VGG-16, and 6.6 fps on ResNet-18.

READ FULL TEXT
research
03/28/2018

Synergy: A HW/SW Framework for High Throughput CNNs on Embedded Heterogeneous SoC

Convolutional Neural Networks (CNN) have been widely deployed in diverse...
research
01/04/2022

A Heterogeneous In-Memory Computing Cluster For Flexible End-to-End Inference of Real-World Deep Neural Networks

Deployment of modern TinyML tasks on small battery-constrained IoT devic...
research
05/09/2018

Performance evaluation over HW/SW co-design SoC memory transfers for a CNN accelerator

Many FPGAs vendors have recently included embedded processors in their d...
research
04/12/2021

Optimizing the Whole-life Cost in End-to-end CNN Acceleration

The acceleration of CNNs has gained increasing atten-tion since their su...
research
05/04/2018

Performance tuning for deep learning on a many-core processor (master thesis)

Convolutional neural networks (CNNs) are becoming very successful and po...
research
03/19/2018

AISC: Approximate Instruction Set Computer

This paper makes the case for a single-ISA heterogeneous computing platf...
research
03/06/2023

Domain-Specific Computational Storage for Serverless Computing

While (1) serverless computing is emerging as a popular form of cloud ex...

Please sign up or login with your details

Forgot password? Click here to reset