Communication-Avoiding and Memory-Constrained Sparse Matrix-Matrix Multiplication at Extreme Scale

10/16/2020
by   Md Taufique Hussain, et al.
0

Sparse matrix-matrix multiplication (SpGEMM) is a widely used kernel in various graph, scientific computing and machine learning algorithms. In this paper, we consider SpGEMMs performed on hundreds of thousands of processors generating trillions of nonzeros in the output matrix. Distributed SpGEMM at this extreme scale faces two key challenges: (1) high communication cost and (2) inadequate memory to generate the output. We address these challenges with an integrated communication-avoiding and memory-constrained SpGEMM algorithm that scales to 262,144 cores (more than 1 million hardware threads) and can multiply sparse matrices of any size as long as inputs and a fraction of output fit in the aggregated memory. As we go from 16,384 cores to 262,144 cores on a Cray XC40 supercomputer, the new SpGEMM algorithm runs 10x faster when multiplying large-scale protein-similarity matrices.

READ FULL TEXT

page 5

page 9

page 10

page 12

research
02/26/2020

Bandwidth-Optimized Parallel Algorithms for Sparse Matrix-Matrix Multiplication using Propagation Blocking

Sparse matrix-matrix multiplication (SpGEMM) is a widely used kernel in ...
research
03/15/2022

Distributed-Memory Sparse Kernels for Machine Learning

Sampled Dense Times Dense Matrix Multiplication (SDDMM) and Sparse Times...
research
10/26/2016

The Reverse Cuthill-McKee Algorithm in Distributed-Memory

Ordering vertices of a graph is key to minimize fill-in and data structu...
research
08/25/2023

SAGE: A Storage-Based Approach for Scalable and Efficient Sparse Generalized Matrix-Matrix Multiplication

Sparse generalized matrix-matrix multiplication (SpGEMM) is a fundamenta...
research
06/06/2023

Towards Memory-Efficient Training for Extremely Large Output Spaces – Learning with 500k Labels on a Single Commodity GPU

In classification problems with large output spaces (up to millions of l...
research
11/20/2019

Parallel Implementations for Computing the Minimum Distance of a Random Linear Code on Multicomputers

The minimum distance of a linear code is a key concept in information th...
research
09/24/2019

A high-level characterisation and generalisation of communication-avoiding programming techniques

Today's hardware's explosion of concurrency plus the explosion of data w...

Please sign up or login with your details

Forgot password? Click here to reset