Inclusive-PIM: Hardware-Software Co-design for Broad Acceleration on Commercial PIM Architectures

09/14/2023
by   Johnathan Alsop, et al.
0

Continual demand for memory bandwidth has made it worthwhile for memory vendors to reassess processing in memory (PIM), which enables higher bandwidth by placing compute units in/near-memory. As such, memory vendors have recently proposed commercially viable PIM designs. However, these proposals are largely driven by the needs of (a narrow set of) machine learning (ML) primitives. While such proposals are reasonable given the the growing importance of ML, as memory is a pervasive component, more inclusive PIM design that can accelerate primitives across domains. In this work, we ascertain the capabilities of commercial PIM proposals to accelerate various primitives across domains. We first begin with outlining a set of characteristics, termed PIM-amenability-test, which aid in assessing if a given primitive is likely to be accelerated by PIM. Next, we apply this test to primitives under study to ascertain efficient data-placement and orchestration to map the primitives to underlying PIM architecture. We observe here that, even though primitives under study are largely PIM-amenable, existing commercial PIM proposals do not realize their performance potential for these primitives. To address this, we identify bottlenecks that arise in PIM execution and propose hardware and software optimizations which stand to broaden the acceleration reach of commercial PIM designs (improving average PIM speedups from 1.12x to 2.49x relative to a GPU baseline). Overall, while we believe emerging commercial PIM proposals add a necessary and complementary design point in the application acceleration space, hardware-software co-design is necessary to deliver their benefits broadly.

READ FULL TEXT

page 1

page 3

page 5

page 6

page 7

page 9

page 10

research
08/08/2023

Collaborative Acceleration for FFT on Commercial Processing-In-Memory Architectures

This paper evaluates the efficacy of recent commercial processing-in-mem...
research
12/30/2018

ORIGAMI: A Heterogeneous Split Architecture for In-Memory Acceleration of Learning

Memory bandwidth bottleneck is a major challenges in processing machine ...
research
07/02/2020

Hardware Acceleration of Sparse and Irregular Tensor Computations of ML Models: A Survey and Insights

Machine learning (ML) models are widely used in many domains including m...
research
04/24/2022

Hardware Acceleration for Third-Generation FHE and PSI Based on It

With the expansion of cloud services, serious concerns about the privacy...
research
06/03/2018

Elasticizing Linux via Joint Disaggregation of Memory and Computation

In this paper, we propose a set of operating system primitives which pro...
research
08/04/2023

Exploiting On-chip Heterogeneity of Versal Architecture for GNN Inference Acceleration

Graph Neural Networks (GNNs) have revolutionized many Machine Learning (...
research
05/12/2021

Guardian: symbolic validation of orderliness in SGX enclaves

Modern processors can offer hardware primitives that allow a process to ...

Please sign up or login with your details

Forgot password? Click here to reset