Towards Heterogeneous Multi-core Accelerators Exploiting Fine-grained Scheduling of Layer-Fused Deep Neural Networks

12/20/2022
by   Arne Symons, et al.
0

To keep up with the ever-growing performance demand of neural networks, specialized hardware (HW) accelerators are shifting towards multi-core and chiplet architectures. So far, these multi-accelerator systems exploit the increased parallelism by pipelining different NN layers across input batches on different cores to increase throughput. Yet, when pursuing this with non-batched layer-by-layer scheduling of latency-critical applications, this fails to fully exploit the available HW resources towards energy-efficient execution at the edge. This work, therefore, enables fine-grained depth-first scheduling of layer-fused DNNs onto multi-core architectures through an open-source modeling framework called Stream. Stream is capable of representing a wide range of scheduling granularities and HW architectures and optimizes execution schedules towards minimal energy, minimal latency and/or minimal memory footprint for constrained edge devices. We validate against three SotA HW implementations employing layer-fused scheduling showing tight matching with measured efficiencies. Using Stream in further explorations, we demonstrate that high-level architectural decisions greatly impact hardware efficiency under the fine-grained scheduling paradigm, reducing the energy-delay product from 2.4x for single-core architectures to up to 30x for heterogeneous multi-core architectures compared to the traditional scheduling at layer granularity.

READ FULL TEXT

page 1

page 3

page 4

page 6

page 7

page 8

page 9

page 10

research
06/25/2022

Heterogeneous Multi-core Array-based DNN Accelerator

In this article, we investigate the impact of architectural parameters o...
research
02/03/2023

PDPU: An Open-Source Posit Dot-Product Unit for Deep Learning Applications

Posit has been a promising alternative to the IEEE-754 floating point fo...
research
12/10/2022

DeFiNES: Enabling Fast Exploration of the Depth-first Scheduling Space for DNN Accelerators through Analytical Modeling

DNN workloads can be scheduled onto DNN accelerators in many different w...
research
03/01/2021

Mitigating Edge Machine Learning Inference Bottlenecks: An Empirical Study on Accelerating Google Edge Models

As the need for edge computing grows, many modern consumer devices now c...
research
04/22/2022

nOS-V: Co-Executing HPC Applications Using System-Wide Task Scheduling

Future Exascale systems will feature massive parallelism, many-core proc...
research
04/14/2021

Virtines: Virtualization at Function Call Granularity

Virtual execution environments provide strong isolation, on-demand infra...
research
12/31/2020

Data Criticality in Multi-Threaded Applications: An Insight for Many-Core Systems

Multi-threaded applications are capable of exploiting the full potential...

Please sign up or login with your details

Forgot password? Click here to reset