The BLAS API of BLASFEO: optimizing performance for small matrices

02/21/2019
by   Gianluca Frison, et al.
0

BLASFEO is a dense linear algebra library providing high-performance implementations of BLAS- and LAPACK-like routines for use in embedded optimization and other applications targeting relatively small matrices. BLASFEO defines an API which uses a packed matrix format as its native format. This format is analogous to the internal memory buffers of optimized BLAS, but it is exposed to the user and it removes the packing cost from the routine call. For matrices fitting in cache, BLASFEO outperforms optimized BLAS implementations, both open-source and proprietary. This paper investigates the addition of a standard BLAS API to the BLASFEO framework, and proposes an implementation switching between two or more algorithms optimized for different matrix sizes. Thanks to the modular assembly framework in BLASFEO, tailored linear algebra kernels with mixed column- and panel-major arguments are easily developed. This BLAS API has lower performance than the BLASFEO API, but it nonetheless outperforms optimized BLAS and especially LAPACK libraries for matrices fitting in cache. Therefore, it can boost a wide range of applications, where standard BLAS and LAPACK libraries are employed and the matrix size is moderate. In particular, this paper investigates the benefits in scientific programming languages such as Octave, SciPy and Julia.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/08/2017

BLASFEO: basic linear algebra subroutines for embedded optimization

BLASFEO is a dense linear algebra library providing high-performance imp...
research
10/10/2019

DBCSR: A Library for Dense Matrix Multiplications on Distributed GPU-Accelerated Systems

Most, if not all the modern scientific simulation packages utilize matri...
research
09/14/2022

Exploiting dynamic sparse matrices for performance portable linear algebra operations

Sparse matrices and linear algebra are at the heart of scientific simula...
research
05/15/2023

Fast Matrix Multiplication via Compiler-only Layered Data Reorganization and Intrinsic Lowering

The resurgence of machine learning has increased the demand for high-per...
research
08/20/2021

On the Parallel I/O Optimality of Linear Algebra Kernels: Near-Optimal Matrix Factorizations

Matrix factorizations are among the most important building blocks of sc...
research
03/13/2019

On the Efficacy and High-Performance Implementation of Quaternion Matrix Multiplication

Quaternion symmetry is ubiquitous in the physical sciences. As such, muc...
research
04/19/2023

Morpheus unleashed: Fast cross-platform SpMV on emerging architectures

Sparse matrices and linear algebra are at the heart of scientific simula...

Please sign up or login with your details

Forgot password? Click here to reset