Network-Accelerated Non-Contiguous Memory Transfers

08/22/2019
by   Salvatore Di Girolamo, et al.
0

Applications often communicate data that is non-contiguous in the send- or the receive-buffer, e.g., when exchanging a column of a matrix stored in row-major order. While non-contiguous transfers are well supported in HPC (e.g., MPI derived datatypes), they can still be up to 5x slower than contiguous transfers of the same size. As we enter the era of network acceleration, we need to investigate which tasks to offload to the NIC: In this work we argue that non-contiguous memory transfers can be transparently networkaccelerated, truly achieving zero-copy communications. We implement and extend sPIN, a packet streaming processor, within a Portals 4 NIC SST model, and evaluate strategies for NIC-offloaded processing of MPI datatypes, ranging from datatype-specific handlers to general solutions for any MPI datatype. We demonstrate up to 10x speedup in the unpack throughput of real applications, demonstrating that non-contiguous memory transfers are a first-class candidate for network acceleration.

READ FULL TEXT

page 8

page 11

research
09/27/2018

Performance of MPI sends of non-contiguous data

We present an experimental investigation of the performance of MPI deriv...
research
05/17/2023

Accelerating MPI Collectives with Process-in-Process-based Multi-object Techniques

In the exascale computing era, optimizing MPI collective performance in ...
research
12/28/2020

TEMPI: An Interposed MPI Library with a Canonical Representation of CUDA-aware Datatypes

MPI derived datatypes are an abstraction that simplifies handling of non...
research
09/22/2021

Code modernization strategies for short-range non-bonded molecular dynamics simulations

As modern HPC systems increasingly rely on greater core counts and wider...
research
11/15/2021

Quo Vadis MPI RMA? Towards a More Efficient Use of MPI One-Sided Communication

The MPI standard has long included one-sided communication abstractions ...
research
04/25/2018

Fast parallel multidimensional FFT using advanced MPI

We present a new method for performing global redistributions of multidi...
research
09/28/2017

HPC optimal parallel communication algorithm for the simulation of fractional-order systems

A parallel numerical simulation algorithm is presented for fractional-or...

Please sign up or login with your details

Forgot password? Click here to reset