Accelerating Neural Network Inference with Processing-in-DRAM: From the Edge to the Cloud

by   Geraldo F. Oliveira, et al.

Neural networks (NNs) are growing in importance and complexity. A neural network's performance (and energy efficiency) can be bound either by computation or memory resources. The processing-in-memory (PIM) paradigm, where computation is placed near or within memory arrays, is a viable solution to accelerate memory-bound NNs. However, PIM architectures vary in form, where different PIM approaches lead to different trade-offs. Our goal is to analyze, discuss, and contrast DRAM-based PIM architectures for NN performance and energy efficiency. To do so, we analyze three state-of-the-art PIM architectures: (1) UPMEM, which integrates processors and DRAM arrays into a single 2D chip; (2) Mensa, a 3D-stack-based PIM architecture tailored for edge devices; and (3) SIMDRAM, which uses the analog principles of DRAM to execute bit-serial operations. Our analysis reveals that PIM greatly benefits memory-bound NNs: (1) UPMEM provides 23x the performance of a high-end GPU when the GPU requires memory oversubscription for a general matrix-vector multiplication kernel; (2) Mensa improves energy efficiency and throughput by 3.0x and 3.1x over the Google Edge TPU for 24 Google edge NN models; and (3) SIMDRAM outperforms a CPU/GPU by 16.7x/1.4x for three binary NNs. We conclude that the ideal PIM architecture for NN models depends on a model's distinct attributes, due to the inherent architectural design choices.


page 1

page 12


Accelerating Bulk Bit-Wise X(N)OR Operation in Processing-in-DRAM Platform

With Von-Neumann computing architectures struggling to address computati...

Mitigating Edge Machine Learning Inference Bottlenecks: An Empirical Study on Accelerating Google Edge Models

As the need for edge computing grows, many modern consumer devices now c...

NEON: Enabling Efficient Support for Nonlinear Operations in Resistive RAM-based Neural Network Accelerators

Resistive Random-Access Memory (RRAM) is well-suited to accelerate neura...

In-Datacenter Performance Analysis of a Tensor Processing Unit

Many architects believe that major improvements in cost-energy-performan...

Position-Agnostic Multi-Microphone Speech Dereverberation

Neural networks (NNs) have been widely applied in speech processing task...

Exploring Deep Neural Networks on Edge TPU

This paper explores the performance of Google's Edge TPU on feed forward...

NVMExplorer: A Framework for Cross-Stack Comparisons of Embedded Non-Volatile Memories

Repeated off-chip memory accesses to DRAM drive up operating power for d...

Please sign up or login with your details

Forgot password? Click here to reset