Improving the performance of classical linear algebra iterative methods via hybrid parallelism

We propose fork-join and task-based hybrid implementations of four classical linear algebra iterative methods (Jacobi, Gauss-Seidel, conjugate gradient and biconjugate gradient stabilised) as well as variations of them. Algorithms are duly documented and the corresponding source code is made publicly available for reproducibility. Both weak and strong scalability benchmarks are conducted to statistically analyse their relative efficiencies. The weak scalability results assert the superiority of a task-based hybrid parallelisation over MPI-only and fork-join hybrid implementations. Indeed, the task-based model is able to achieve speedups of up to 25 MPI-only counterpart depending on the numerical method and the computational resources used. For strong scalability scenarios, hybrid methods based on tasks remain more efficient with moderate computational resources where data locality does not play an important role. Fork-join hybridisation often yields mixed results and hence does not present a competitive advantage over a much simpler MPI approach.

READ FULL TEXT
research
07/22/2020

Collectives in hybrid MPI+MPI code: design, practice and performance

The use of hybrid scheme combining the message passing programming model...
research
05/16/2023

MPI-rical: Data-Driven MPI Distributed Parallelism Assistance with Transformers

Automatic source-to-source parallelization of serial code for shared and...
research
10/11/2017

Subdomain Deflation and Algebraic Multigrid: Combining Multiscale with Multilevel

The paper proposes a combination of the subdomain deflation method and l...
research
02/06/2020

Scalable Communication Endpoints for MPI+Threads Applications

Hybrid MPI+threads programming is gaining prominence as an alternative t...
research
07/21/2022

Quantifying Overheads in Charm++ and HPX using Task Bench

Asynchronous Many-Task (AMT) runtime systems take advantage of multi-cor...
research
05/09/2018

MPI+X: task-based parallelization and dynamic load balance of finite element assembly

The main computing tasks of a finite element code(FE) for solving partia...
research
05/16/2018

A Note on QR-Based Model Reduction: Algorithm, Software, and Gravitational Wave Applications

While the proper orthogonal decomposition (POD) is optimal under certain...

Please sign up or login with your details

Forgot password? Click here to reset