NeuMMU: Architectural Support for Efficient Address Translations in Neural Processing Units

11/15/2019
by   Bongjoon Hyun, et al.
0

To satisfy the compute and memory demands of deep neural networks, neural processing units (NPUs) are widely being utilized for accelerating deep learning algorithms. Similar to how GPUs have evolved from a slave device into a mainstream processor architecture, it is likely that NPUs will become first class citizens in this fast-evolving heterogeneous architecture space. This paper makes a case for enabling address translation in NPUs to decouple the virtual and physical memory address space. Through a careful data-driven application characterization study, we root-cause several limitations of prior GPU-centric address translation schemes and propose a memory management unit (MMU) that is tailored for NPUs. Compared to an oracular MMU design point, our proposal incurs only an average 0.06

READ FULL TEXT

page 8

page 9

research
12/01/2016

Near-Memory Address Translation

Memory and logic integration on the same chip is becoming increasingly c...
research
08/16/2017

Improving Multi-Application Concurrency Support Within the GPU Memory System

GPUs exploit a high degree of thread-level parallelism to hide long-late...
research
08/01/2020

DeACT: Architecture-Aware Virtual Memory Support for Fabric Attached Memory Systems

The exponential growth of data has driven technology providers to develo...
research
08/12/2019

Design space exploration of Ferroelectric FET based Processing-in-Memory DNN Accelerator

In this letter, we quantify the impact of device limitations on the clas...
research
02/18/2019

Beyond the Memory Wall: A Case for Memory-centric HPC System for Deep Learning

As the models and the datasets to train deep learning (DL) models scale,...
research
02/19/2018

A Scalable Near-Memory Architecture for Training Deep Neural Networks on Large In-Memory Datasets

Most investigations into near-memory hardware accelerators for deep neur...
research
08/08/2023

Collaborative Acceleration for FFT on Commercial Processing-In-Memory Architectures

This paper evaluates the efficacy of recent commercial processing-in-mem...

Please sign up or login with your details

Forgot password? Click here to reset