Metrics and Design of an Instruction Roofline Model for AMD GPUs

by   Matthew Leinhauser, et al.

Due to the recent announcement of the Frontier supercomputer, many scientific application developers are working to make their applications compatible with AMD architectures (CPU-GPU), which means moving away from the traditional CPU and NVIDIA-GPU systems. Due to the current limitations of profiling tools for AMD GPUs, this shift leaves a void in how to measure application performance on AMD GPUs. In this paper, we design an instruction roofline model for AMD GPUs using AMD's ROCProfiler and a benchmarking tool, BabelStream (the HIP implementation), as a way to measure an application's performance in instructions and memory transactions on new AMD hardware. Specifically, we create instruction roofline models for a case study scientific application, PIConGPU, an open source particle-in-cell (PIC) simulations application used for plasma and laser-plasma physics on the NVIDIA V100, AMD Radeon Instinct MI60, and AMD Instinct MI100 GPUs. When looking at the performance of multiple kernels of interest in PIConGPU we find that although the AMD MI100 GPU achieves a similar, or better, execution time compared to the NVIDIA V100 GPU, profiling tool differences make comparing performance of these two architectures hard. When looking at execution time, GIPS, and instruction intensity, the AMD MI60 achieves the worst performance out of the three GPUs used in this work.


page 5

page 6


Multi-GPU Acceleration of the iPIC3D Implicit Particle-in-Cell Code

iPIC3D is a widely used massively parallel Particle-in-Cell code for the...

Performance Analysis of a Quantum Monte Carlo Application on Multiple Hardware Architectures Using the HPX Runtime

This paper describes how we successfully used the HPX programming model ...

Demystifying the Nvidia Ampere Architecture through Microbenchmarking and Instruction-level Analysis

Graphics processing units (GPUs) are now considered the leading hardware...

A Valgrind Tool to Compute the Working Set of a Software Process

This paper introduces a new open-source tool for the dynamic analyzer Va...

A Runtime-Based Computational Performance Predictor for Deep Neural Network Training

Deep learning researchers and practitioners usually leverage GPUs to hel...

Instructions' Latencies Characterization for NVIDIA GPGPUs

The last decade has seen a shift in the computer systems industry where ...

A mechanism for balancing accuracy and scope in cross-machine black-box GPU performance modeling

The ability to model, analyze, and predict execution time of computation...

Please sign up or login with your details

Forgot password? Click here to reset