Communication-Efficient Separable Neural Network for Distributed Inference on Edge Devices

11/03/2021

∙

The inference of Neural Networks is usually restricted by the resources (e.g., computing power, memory, bandwidth) on edge devices. In addition to improving the hardware design and deploying efficient models, it is possible to aggregate the computing power of many devices to enable the machine learning models. In this paper, we proposed a novel method of exploiting model parallelism to separate a neural network for distributed inferences. To achieve a better balance between communication latency, computation latency, and performance, we adopt neural architecture search (NAS) to search for the best transmission policy and reduce the amount of communication. The best model we found decreases by 86.6 baseline and does not impact performance much. Under proper specifications of devices and configurations of models, our experiments show that the inference of large neural networks on edge clusters can be distributed and accelerated, which provides a new solution for the deployment of intelligent applications in the internet of things (IoT).

READ FULL TEXT

Communication-Efficient Separable Neural Network for Distributed Inference on Edge Devices

Neural Architecture Search for Improving Latency-Accuracy Trade-off in Split Computing

Memory- and Communication-Aware Model Compression for Distributed Deep Learning Inference on IoT

Knowledge is at the Edge! How to Search in Distributed Machine Learning Models

ParaDiS: Parallelly Distributable Slimmable Neural Networks

License Plate Recognition Using Neural Architecture Search for Edge Devices

Robust error bounds for quantised and pruned neural networks

SplitNets: Designing Neural Architectures for Efficient Distributed Computing on Head-Mounted Systems

Communication-Efficient Separable Neural Network for Distributed Inference on Edge Devices

Related Research

Neural Architecture Search for Improving Latency-Accuracy Trade-off in Split Computing

Memory- and Communication-Aware Model Compression for Distributed Deep Learning Inference on IoT

Knowledge is at the Edge! How to Search in Distributed Machine Learning Models

ParaDiS: Parallelly Distributable Slimmable Neural Networks

License Plate Recognition Using Neural Architecture Search for Edge Devices

Robust error bounds for quantised and pruned neural networks

SplitNets: Designing Neural Architectures for Efficient Distributed Computing on Head-Mounted Systems