Optimizing Memory Placement using Evolutionary Graph Reinforcement Learning

by   Shauharda Khadka, et al.

As modern neural networks have grown to billions of parameters, meeting tight latency budgets has become increasingly challenging. Approaches like compression, sparsification and network pruning have proven effective to tackle this problem - but they rely on modifications of the underlying network. In this paper, we look at a complimentary approach of optimizing how tensors are mapped to on-chip memory in an inference accelerator while leaving the network parameters untouched. Since different memory components trade off capacity for bandwidth differently, a sub-optimal mapping can result in high latency. We introduce evolutionary graph reinforcement learning (EGRL) - a method combining graph neural networks, reinforcement learning (RL) and evolutionary search - that aims to find the optimal mapping to minimize latency. Furthermore, a set of fast, stateless policies guide the evolutionary search to improve sample-efficiency. We train and validate our approach directly on the Intel NNP-I chip for inference using a batch size of 1. EGRL outperforms policy-gradient, evolutionary search and dynamic programming baselines on BERT, ResNet-101 and ResNet-50. We achieve 28-78 NNP-I compiler on all three workloads.


Placement Optimization with Deep Reinforcement Learning

Placement Optimization is an important problem in systems and chip desig...

GNN-RL Compression: Topology-Aware Network Pruning using Multi-stage Graph Embedding and Reinforcement Learning

Model compression is an essential technique for deploying deep neural ne...

DECORE: Deep Compression with Reinforcement Learning

Deep learning has become an increasingly popular and powerful option for...

Chip Placement with Deep Reinforcement Learning

In this work, we present a learning-based approach to chip placement, on...

On Optimizing Deep Convolutional Neural Networks by Evolutionary Computing

Optimization for deep networks is currently a very active area of resear...

Optimizing Memory Mapping Using Deep Reinforcement Learning

Resource scheduling and allocation is a critical component of many high ...

Please sign up or login with your details

Forgot password? Click here to reset