High Performance Implementation of Boris Particle Pusher on DPC++. A First Look at oneAPI

by   Valentin Volokitin, et al.

New hardware architectures open up immense opportunities for supercomputer simulations. However, programming techniques for different architectures vary significantly, which leads to the necessity of developing and supporting multiple code versions, each being optimized for specific hardware features. The oneAPI framework, recently introduced by Intel, contains a set of programming tools for the development of portable codes that can be compiled and fine-tuned for CPUs, GPUs, FPGAs, and accelerators. In this paper, we report on the experience of porting the implementation of Boris particle pusher to oneAPI. Boris particle pusher is one of the most demanding computational stages of the Particle-in-Cell method, which, in particular, is used for supercomputer simulations of laser-plasma interactions. We show how to adapt the C++ implementation of the particle push algorithm from the Hi-Chi project to the DPC++ programming language and report the performance of the code on high-end Intel CPUs (Xeon Platinum 8260L) and Intel GPUs (P630 and Iris Xe Max). It turned out that our C++ code can be easily ported to DPC++. We found that on CPUs the resulting DPC++ code is only  10 optimized C++ code. Moreover, the code is compiled and run on new Intel GPUs without any specific optimizations and shows the expected performance, taking into account the parameters of the hardware.


page 1

page 2

page 3

page 4


Black-Scholes Option Pricing on Intel CPUs and GPUs: Implementation on SYCL and Optimization Techniques

The Black-Scholes option pricing problem is one of the widely used finan...

waLBerla: A block-structured high-performance framework for multiphysics simulations

Programming current supercomputers efficiently is a challenging task. Mu...

Optimizing AIREBO: Navigating the Journey from Complex Legacy Code to High Performance

Despite initiatives to improve the quality of scientific codes, there st...

Optimized routines for event generators in QED-PIC codes

In recent years, the prospects of performing fundamental and applied stu...

Technical Report about Tiramisu: a Three-Layered Abstraction for Hiding Hardware Complexity from DSL Compilers

High-performance DSL developers work hard to take advantage of modern ha...

Assessing Opportunities of SYCL and Intel oneAPI for Biological Sequence Alignment

Background and objectives. The computational biology area is growing up ...

VPIC 2.0: Next Generation Particle-in-Cell Simulations

VPIC is a general purpose Particle-in-Cell simulation code for modeling ...

Please sign up or login with your details

Forgot password? Click here to reset