k-ported vs. k-lane Broadcast, Scatter, and Alltoall Algorithms

08/27/2020
by   Jesper Larsson Träff, et al.
0

In k-ported message-passing systems, a processor can simultaneously receive k different messages from k other processors, and send k different messages to k other processors that may or may not be different from the processors from which messages are received. Modern clustered systems may not have such capabilities. Instead, compute nodes consisting of n processors can simultaneously send and receive k messages from other nodes, by letting k processors on the nodes concurrently send and receive at most one message. We pose the question of how to design good algorithms for this k-lane model, possibly by adapting algorithms devised for the traditional k-ported model. We discuss and compare a number of (non-optimal) k-lane algorithms for the broadcast, scatter and alltoall collective operations (as found in, e.g., MPI), and experimentally evaluate these on a small 36× 32-node cluster with a dual OmniPath network (corresponding to k=2). Results are preliminary.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset