Communication-Efficient Separable Neural Network for Distributed Inference on Edge Devices

11/03/2021
by   Jun-Liang Lin, et al.
0

The inference of Neural Networks is usually restricted by the resources (e.g., computing power, memory, bandwidth) on edge devices. In addition to improving the hardware design and deploying efficient models, it is possible to aggregate the computing power of many devices to enable the machine learning models. In this paper, we proposed a novel method of exploiting model parallelism to separate a neural network for distributed inferences. To achieve a better balance between communication latency, computation latency, and performance, we adopt neural architecture search (NAS) to search for the best transmission policy and reduce the amount of communication. The best model we found decreases by 86.6 baseline and does not impact performance much. Under proper specifications of devices and configurations of models, our experiments show that the inference of large neural networks on edge clusters can be distributed and accelerated, which provides a new solution for the deployment of intelligent applications in the internet of things (IoT).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/30/2022

Neural Architecture Search for Improving Latency-Accuracy Trade-off in Split Computing

This paper proposes a neural architecture search (NAS) method for split ...
research
07/26/2019

Memory- and Communication-Aware Model Compression for Distributed Deep Learning Inference on IoT

Model compression has emerged as an important area of research for deplo...
research
10/13/2017

Knowledge is at the Edge! How to Search in Distributed Machine Learning Models

With the advent of the Internet of Things and Industry 4.0 an enormous a...
research
10/06/2021

ParaDiS: Parallelly Distributable Slimmable Neural Networks

When several limited power devices are available, one of the most effici...
research
03/06/2022

License Plate Recognition Using Neural Architecture Search for Edge Devices

The mutually beneficial blend of artificial intelligence with internet o...
research
11/30/2020

Robust error bounds for quantised and pruned neural networks

With the rise of smartphones and the internet-of-things, data is increas...
research
04/10/2022

SplitNets: Designing Neural Architectures for Efficient Distributed Computing on Head-Mounted Systems

We design deep neural networks (DNNs) and corresponding networks' splitt...

Please sign up or login with your details

Forgot password? Click here to reset