Optimization and Portability of a Fusion OpenACC-based FORTRAN HPC Code from NVIDIA to AMD GPUs

05/17/2023
by   Igor Sfiligoi, et al.
0

NVIDIA has been the main provider of GPU hardware in HPC systems for over a decade. Most applications that benefit from GPUs have thus been developed and optimized for the NVIDIA software stack. Recent exascale HPC systems are, however, introducing GPUs from other vendors, e.g. with the AMD GPU-based OLCF Frontier system just becoming available. AMD GPUs cannot be directly accessed using the NVIDIA software stack, and require a porting effort by the application developers. This paper provides an overview of our experience porting and optimizing the CGYRO code, a widely-used fusion simulation tool based on FORTRAN with OpenACC-based GPU acceleration. While the porting from the NVIDIA compilers was relatively straightforward using the CRAY compilers on the AMD systems, the performance optimization required more fine-tuning. In the optimization effort, we uncovered code sections that had performed well on NVIDIA GPUs, but were unexpectedly slow on AMD GPUs. After AMD-targeted code optimizations, performance on AMD GPUs has increased to meet our expectations. Modest speed improvements were also seen on NVIDIA GPUs, which was an unexpected benefit of this exercise.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/07/2019

Multi-GPU Acceleration of the iPIC3D Implicit Particle-in-Cell Code

iPIC3D is a widely used massively parallel Particle-in-Cell code for the...
research
05/19/2022

Comparing single-node and multi-node performance of an important fusion HPC code benchmark

Fusion simulations have traditionally required the use of leadership sca...
research
06/09/2021

Benchmarking the Nvidia GPU Lineage: From Early K80 to Modern A100 with Asynchronous Memory Transfers

For many, Graphics Processing Units (GPUs) provides a source of reliable...
research
07/01/2019

GPU-based Parallel Computation Support for Stan

This paper details an extensible OpenCL framework that allows Stan to ut...
research
08/26/2020

8 Steps to 3.7 TFLOP/s on NVIDIA V100 GPU: Roofline Analysis and Other Tricks

Performance optimization can be a daunting task especially as the hardwa...
research
03/29/2023

A Spatially Correlated Competing Risks Time-to-Event Model for Supercomputer GPU Failure Data

Graphics processing units (GPUs) are widely used in many high-performanc...
research
10/31/2016

Hybrid CPU-GPU generation of the Hamiltonian and Overlap matrices in FLAPW methods

In this paper we focus on the integration of high-performance numerical ...

Please sign up or login with your details

Forgot password? Click here to reset