Efficient Interleaved Batch Matrix Solvers for CUDA

09/10/2019
by   Andrew Gloster, et al.
0

In this paper we present a new methodology for data accesses when solving batches of Tridiagonal and Pentadiagonal matrices that all share the same LHS matrix. By only storing one copy of this matrix there is a significant reduction in storage overheads and the authors show that there is also a performance increase in terms of compute time. These two results combined lead to an overall more efficient implementation over the current state of the art algorithms cuThomasBatch and cuPentBatch, allowing for a greater number of systems to be solved on a single GPU.

READ FULL TEXT

page 7

page 11

page 12

research
07/08/2021

A Batched GPU Methodology for Numerical Solutions of Partial Differential Equations

In this paper we present a methodology for data accesses when solving ba...
research
09/16/2020

Accelerating Domain Propagation: an Efficient GPU-Parallel Algorithm over Sparse Matrices

Fast domain propagation of linear constraints has become a crucial compo...
research
07/13/2022

Grassmanian packings: Trust region stochastic tuning for matrix incoherence

We provide a new numerical procedure for constructing low coherence matr...
research
04/28/2022

Programming Matrices as Staged Sparse Rows to Generate Efficient Matrix-free Differential Equation Solver

Solving differential equations is a critical task in scientific computin...
research
12/20/2019

Matrix oriented reduction of space-time Petrov-Galerkin variational problems

Variational formulations of time-dependent PDEs in space and time yield ...
research
05/26/2019

Engineering Kernelization for Maximum Cut

Kernelization is a general theoretical framework for preprocessing insta...
research
07/26/2020

Optimizing Block-Sparse Matrix Multiplications on CUDA with TVM

We implemented and optimized matrix multiplications between dense and bl...

Please sign up or login with your details

Forgot password? Click here to reset