Retrospective: A Scalable Processing-in-Memory Accelerator for Parallel Graph Processing

06/27/2023
by   Junwhan Ahn, et al.
0

Our ISCA 2015 paper provides a new programmable processing-in-memory (PIM) architecture and system design that can accelerate key data-intensive applications, with a focus on graph processing workloads. Our major idea was to completely rethink the system, including the programming model, data partitioning mechanisms, system support, instruction set architecture, along with near-memory execution units and their communication architecture, such that an important workload can be accelerated at a maximum level using a distributed system of well-connected near-memory accelerators. We built our accelerator system, Tesseract, using 3D-stacked memories with logic layers, where each logic layer contains general-purpose processing cores and cores communicate with each other using a message-passing programming model. Cores could be specialized for graph processing (or any other application to be accelerated). To our knowledge, our paper was the first to completely design a near-memory accelerator system from scratch such that it is both generally programmable and specifically customizable to accelerate important applications, with a case study on major graph processing workloads. Ensuing work in academia and industry showed that similar approaches to system design can greatly benefit both graph processing workloads and other applications, such as machine learning, for which ideas from Tesseract seem to have been influential. This short retrospective provides a brief analysis of our ISCA 2015 paper and its impact. We briefly describe the major ideas and contributions of the work, discuss later works that built on it or were influenced by it, and make some educated guesses on what the future may bring on PIM and accelerator systems.

READ FULL TEXT
research
01/29/2019

PUMA: A Programmable Ultra-efficient Memristor-based Accelerator for Machine Learning Inference

Memristor crossbars are circuits capable of performing analog matrix-vec...
research
10/04/2021

Benchmarking Memory-Centric Computing Systems: Analysis of Real Processing-in-Memory Hardware

Many modern workloads such as neural network inference and graph process...
research
05/09/2021

Benchmarking a New Paradigm: An Experimental Analysis of a Real Processing-in-Memory Architecture

Many modern workloads, such as neural networks, databases, and graph pro...
research
02/21/2023

Single Event Effects Assessment of UltraScale+ MPSoC Systems under Atmospheric Radiation

The AMD UltraScale+ XCZU9EG device is a Multi-Processor System-on-Chip (...
research
05/09/2018

Performance evaluation over HW/SW co-design SoC memory transfers for a CNN accelerator

Many FPGAs vendors have recently included embedded processors in their d...
research
01/06/2023

CHARM: Composing Heterogeneous Accelerators for Matrix Multiply on Versal ACAP Architecture

Dense matrix multiply (MM) serves as one of the most heavily used kernel...
research
05/13/2021

Combining Emulation and Simulation to Evaluate a Near Memory Key/Value Lookup Accelerator

Processing large numbers of key/value lookups is an integral part of mod...

Please sign up or login with your details

Forgot password? Click here to reset