Efficient Sparse Matrix Kernels based on Adaptive Workload-Balancing and Parallel-Reduction

06/30/2021
by   Guyue Huang, et al.
0

Sparse matrix-vector and matrix-matrix multiplication (SpMV and SpMM) are fundamental in both conventional (graph analytics, scientific computing) and emerging (sparse DNN, GNN) domains. Workload-balancing and parallel-reduction are widely-used design principles for efficient SpMV. However, prior work fails to resolve how to implement and adaptively use the two principles for SpMV/MM. To overcome this obstacle, we first complete the implementation space with optimizations by filling three missing pieces in prior work, including: (1) We show that workload-balancing and parallel-reduction can be combined through a segment-reduction algorithm implemented with SIMD-shuffle primitives. (2) We show that parallel-reduction can be implemented in SpMM through loading the dense-matrix rows with vector memory operations. (3) We show that vectorized loading of sparse rows, being a part of the benefit of parallel-reduction, can co-exist with sequential-reduction in SpMM through temporally caching sparse-matrix elements in the shared memory. In terms of adaptive use, we analyze how the benefit of two principles change with two characteristics from the input data space: the diverse sparsity pattern and dense-matrix width. We find the benefit of the two principles fades along with the increased total workload, i.e. the increased dense-matrix width. We also identify, for SpMV and SpMM, different sparse-matrix features that impact workload-balancing effectiveness. Our design consistently exceeds cuSPARSE by 1.07-1.57x on different GPUs and dense matrix width, and the kernel selection rules involve 5-12 integrated into popular graph learning frameworks to accelerate GNN training.

READ FULL TEXT
research
06/18/2020

Sparse GPU Kernels for Deep Learning

Scientific workloads have traditionally exploited high levels of sparsit...
research
07/07/2020

GE-SpMM: General-purpose Sparse Matrix-Matrix Multiplication on GPUs for Graph Neural Networks

Graph Neural Networks (GNNs) have achieved significant improvements in v...
research
06/27/2023

Input-sensitive dense-sparse primitive compositions for GNN acceleration

Graph neural networks (GNN) have become an important class of neural net...
research
11/07/2020

FusedMM: A Unified SDDMM-SpMM Kernel for Graph Embedding and Graph Neural Networks

We develop a fused matrix multiplication kernel that unifies sampled den...
research
12/14/2018

Impact of Traditional Sparse Optimizations on a Migratory Thread Architecture

Achieving high performance for sparse applications is challenging due to...
research
06/30/2020

Adaptive SpMV/SpMSpV on GPUs for Input Vectors of Varied Sparsity

Despite numerous efforts for optimizing the performance of Sparse Matrix...
research
07/24/2023

Entropy Maximization in Sparse Matrix by Vector Multiplication (max_E SpMV)

The peak performance of any SpMV depends primarily on the available memo...

Please sign up or login with your details

Forgot password? Click here to reset