Machine Learning Training on a Real Processing-in-Memory System

by   Juan Gomez-Luna, et al.

Training machine learning algorithms is a computationally intensive process, which is frequently memory-bound due to repeatedly accessing large training datasets. As a result, processor-centric systems (e.g., CPU, GPU) suffer from costly data movement between memory units and processing units, which consumes large amounts of energy and execution cycles. Memory-centric computing systems, i.e., computing systems with processing-in-memory (PIM) capabilities, can alleviate this data movement bottleneck. Our goal is to understand the potential of modern general-purpose PIM architectures to accelerate machine learning training. To do so, we (1) implement several representative classic machine learning algorithms (namely, linear regression, logistic regression, decision tree, K-means clustering) on a real-world general-purpose PIM architecture, (2) characterize them in terms of accuracy, performance and scaling, and (3) compare to their counterpart implementations on CPU and GPU. Our experimental evaluation on a memory-centric computing system with more than 2500 PIM cores shows that general-purpose PIM architectures can greatly accelerate memory-bound machine learning workloads, when the necessary operations and datatypes are natively supported by PIM hardware. To our knowledge, our work is the first one to evaluate training of machine learning algorithms on a real-world general-purpose PIM architecture.


page 1

page 2

page 3

page 4


An Experimental Evaluation of Machine Learning Training on a Real Processing-in-Memory System

Training machine learning (ML) algorithms is a computationally intensive...

TransPimLib: A Library for Efficient Transcendental Functions on Processing-in-Memory Systems

Processing-in-memory (PIM) promises to alleviate the data movement bottl...

A Framework for High-throughput Sequence Alignment using Real Processing-in-Memory Systems

Sequence alignment is a fundamentally memory bound computation whose per...

ALPINE: Analog In-Memory Acceleration with Tight Processor Integration for Deep Learning

Analog in-memory computing (AIMC) cores offers significant performance a...

One-step regression and classification with crosspoint resistive memory arrays

Machine learning has been getting a large attention in the recent years,...

Overcoming Limitations of GPGPU-Computing in Scientific Applications

The performance of discrete general purpose graphics processing units (G...

Processor in Non-Volatile Memory (PiNVSM): Towards to Data-centric Computing in Decentralized Environment

The AI problem has no solution in the environment of existing hardware s...

Please sign up or login with your details

Forgot password? Click here to reset