Heterogeneous Sparse Matrix-Vector Multiplication via Compressed Sparse Row Format

03/10/2022
by   Phillip Allen Lane, et al.
0

Sparse matrix-vector multiplication (SpMV) is one of the most important kernels in high-performance computing (HPC), yet SpMV normally suffers from ill performance on many devices. Due to ill performance, SpMV normally requires special care to store and tune for a given device. Moreover, HPC is facing heterogeneous hardware containing multiple different compute units, e.g., many-core CPUs and GPUs. Therefore, an emerging goal has been to produce heterogeneous formats and methods that allow critical kernels, e.g., SpMV, to be executed on different devices with portable performance and minimal changes to format and method. This paper presents a heterogeneous format based on CSR, named CSR-k, that can be tuned quickly and outperforms the average performance of Intel MKL on Intel Xeon Platinum 8380 and AMD Epyc 7742 CPUs while still outperforming NVIDIA's cuSPARSE and Sandia National Laboratories' KokkosKernels on NVIDIA A100 and V100 for regular sparse matrices, i.e., sparse matrices where the number of nonzeros per row has a variance ≤ 10, such as those commonly generated from two and three-dimensional finite difference and element problems. In particular, CSR-k achieves this with reordering and by grouping rows into a hierarchical structure of super-rows and super-super-rows that are represented by just a few extra arrays of pointers. Due to its simplicity, a model can be tuned for a device and used to select super-row and super-super-rows sizes in constant time.

READ FULL TEXT

page 1

page 4

page 8

research
07/23/2013

A unified sparse matrix data format for efficient general sparse matrix-vector multiply on modern processors with wide SIMD units

Sparse matrix-vector multiplication (spMVM) is the most time-consuming k...
research
12/19/2021

FSpGEMM: An OpenCL-based HPC Framework for Accelerating General Sparse Matrix-Matrix Multiplication on FPGAs

General sparse matrix-matrix multiplication (SpGEMM) is an integral part...
research
12/13/2018

Javelin: A Scalable Implementation for Sparse Incomplete LU Factorization

In this work, we present a new scalable incomplete LU factorization fram...
research
11/15/2017

Performance Analysis and Optimization of Sparse Matrix-Vector Multiplication on Modern Multi- and Many-Core Processors

This paper presents a low-overhead optimizer for the ubiquitous sparse m...
research
07/08/2023

Rosko: Row Skipping Outer Products for Sparse Matrix Multiplication Kernels

We propose Rosko – row skipping outer products – for deriving sparse mat...
research
03/09/2023

Optimizing Sparse Linear Algebra Through Automatic Format Selection and Machine Learning

Sparse matrices are an integral part of scientific simulations. As hardw...
research
10/19/2020

SlimSell: A Vectorizable Graph Representation for Breadth-First Search

Vectorization and GPUs will profoundly change graph processing. Traditio...

Please sign up or login with your details

Forgot password? Click here to reset