TransPimLib: A Library for Efficient Transcendental Functions on Processing-in-Memory Systems

by   Maurus Item, et al.

Processing-in-memory (PIM) promises to alleviate the data movement bottleneck in modern computing systems. However, current real-world PIM systems have the inherent disadvantage that their hardware is more constrained than in conventional processors (CPU, GPU), due to the difficulty and cost of building processing elements near or inside the memory. As a result, general-purpose PIM architectures support fairly limited instruction sets and struggle to execute complex operations such as transcendental functions and other hard-to-calculate operations (e.g., square root). These operations are particularly important for some modern workloads, e.g., activation functions in machine learning applications. In order to provide support for transcendental (and other hard-to-calculate) functions in general-purpose PIM systems, we present TransPimLib, a library that provides CORDIC-based and LUT-based methods for trigonometric functions, hyperbolic functions, exponentiation, logarithm, square root, etc. We develop an implementation of TransPimLib for the UPMEM PIM architecture and perform a thorough evaluation of TransPimLib's methods in terms of performance and accuracy, using microbenchmarks and three full workloads (Blackscholes, Sigmoid, Softmax). We open-source all our code and datasets at <>.


page 1

page 2

page 3

page 4


Machine Learning Training on a Real Processing-in-Memory System

Training machine learning algorithms is a computationally intensive proc...

An Experimental Evaluation of Machine Learning Training on a Real Processing-in-Memory System

Training machine learning (ML) algorithms is a computationally intensive...

Benchmarking a New Paradigm: An Experimental Analysis of a Real Processing-in-Memory Architecture

Many modern workloads, such as neural networks, databases, and graph pro...

DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks

Data movement between the CPU and main memory is a first-order obstacle ...

Accelerating key bioinformatics tasks 100-fold by improving memory access

Most experimental sciences now rely on computing, and biological science...

Squareplus: A Softplus-Like Algebraic Rectifier

We present squareplus, an activation function that resembles softplus, b...

MeSHwA: The case for a Memory-Safe Software and Hardware Architecture for Serverless Computing

Motivated by developer productivity, serverless computing, and microserv...

Please sign up or login with your details

Forgot password? Click here to reset