Implementing Strassen's Algorithm with BLIS

05/03/2016
by   Jianyu Huang, et al.
0

We dispel with "street wisdom" regarding the practical implementation of Strassen's algorithm for matrix-matrix multiplication (DGEMM). Conventional wisdom: it is only practical for very large matrices. Our implementation is practical for small matrices. Conventional wisdom: the matrices being multiplied should be relatively square. Our implementation is practical for rank-k updates, where k is relatively small (a shape of importance for libraries like LAPACK). Conventional wisdom: it inherently requires substantial workspace. Our implementation requires no workspace beyond buffers already incorporated into conventional high-performance DGEMM implementations. Conventional wisdom: a Strassen DGEMM interface must pass in workspace. Our implementation requires no such workspace and can be plug-compatible with the standard DGEMM interface. Conventional wisdom: it is hard to demonstrate speedup on multi-core architectures. Our implementation demonstrates speedup over conventional DGEMM even on an Intel(R) Xeon Phi(TM) coprocessor utilizing 240 threads. We show how a distributed memory matrix-matrix multiplication also benefits from these advances.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/03/2016

Generating Families of Practical Fast Matrix Multiplication Algorithms

Matrix multiplication (GEMM) is a core operation to numerous scientific ...
research
02/16/2023

GEMMFIP: Unifying GEMM in BLIS

Matrix libraries often focus on achieving high performance for problems ...
research
07/04/2023

Matrix Multiplication Using Only Addition

Matrix multiplication consumes a large fraction of the time taken in man...
research
05/08/2019

Performance Engineering for a Tall Skinny Matrix Multiplication Kernel on GPUs

General matrix-matrix multiplications (GEMM) in vendor-supplied BLAS lib...
research
05/08/2019

Performance Engineering for Real and Complex Tall Skinny Matrix Multiplication Kernels on GPUs

General matrix-matrix multiplications with double-precision real and com...
research
02/12/2020

Eigenvector Component Calculation Speedup over NumPy for High-Performance Computing

Applications related to artificial intelligence, machine learning, and s...
research
01/17/2019

Supporting mixed-datatype matrix multiplication within the BLIS framework

We approach the problem of implementing mixed-datatype support within th...

Please sign up or login with your details

Forgot password? Click here to reset