Ordering Chaos: Memory-Aware Scheduling of Irregularly Wired Neural Networks for Edge Devices

by   Byung Hoon Ahn, et al.

Recent advances demonstrate that irregularly wired neural networks from Neural Architecture Search (NAS) and Random Wiring can not only automate the design of deep neural networks but also emit models that outperform previous manual designs. These designs are especially effective while designing neural architectures under hard resource constraints (memory, MACs, . . . ) which highlights the importance of this class of designing neural networks. However, such a move creates complication in the previously streamlined pattern of execution. In fact one of the main challenges is that the order of such nodes in the neural network significantly effects the memory footprint of the intermediate activations. Current compilers do not schedule with regard to activation memory footprint that it significantly increases its peak compared to the optimum, rendering it not applicable for edge devices. To address this standing issue, we present a memory-aware compiler, dubbed SERENITY, that utilizes dynamic programming to find a sequence that finds a schedule with optimal memory footprint. Our solution also comprises of graph rewriting technique that allows further reduction beyond the optimum. As such, SERENITY achieves optimal peak memory, and the graph rewriting technique further improves this resulting in 1.68x improvement with dynamic programming-based scheduler and 1.86x with graph rewriting, against TensorFlow Lite with less than one minute overhead.


page 4

page 9


Memory-aware Scheduling for Complex Wired Networks with Iterative Graph Optimization

Memory-aware network scheduling is becoming increasingly important for d...

Hardware-Aware Graph Neural Network Automated Design for Edge Computing Platforms

Graph neural networks (GNNs) have emerged as a popular strategy for hand...

A Graph Theoretic Framework of Recomputation Algorithms for Memory-Efficient Backpropagation

Recomputation algorithms collectively refer to a family of methods that ...

MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning

Tiny deep learning on microcontroller units (MCUs) is challenging due to...

μNAS: Constrained Neural Architecture Search for Microcontrollers

IoT devices are powered by microcontroller units (MCUs) which are extrem...

Few-Bit Backward: Quantized Gradients of Activation Functions for Memory Footprint Reduction

Memory footprint is one of the main limiting factors for large neural ne...

Memory-Aware Fusing and Tiling of Neural Networks for Accelerated Edge Inference

A rising research challenge is running costly machine learning (ML) netw...

Please sign up or login with your details

Forgot password? Click here to reset