PsPIN: A high-performance low-power architecture for flexible in-network compute

by   Salvatore Di Girolamo, et al.

The capacity of offloading data and control tasks to the network is becoming increasingly important, especially if we consider the faster growth of network speed when compared to CPU frequencies. In-network compute alleviates the host CPU load by running tasks directly in the network, enabling additional computation/communication overlap and potentially improving overall application performance. However, sustaining bandwidths provided by next-generation networks, e.g., 400 Gbit/s, can become a challenge. sPIN is a programming model for in-NIC compute, where users specify handler functions that are executed on the NIC, for each incoming packet belonging to a given message or flow. It enables a CUDA-like acceleration, where the NIC is equipped with lightweight processing elements that process network packets in parallel. We investigate the architectural specialties that a sPIN NIC should provide to enable high-performance, low-power, and flexible packet processing. We introduce PsPIN, a first open-source sPIN implementation, based on a multi-cluster RISC-V architecture and designed according to the identified architectural specialties. We investigate the performance of PsPIN with cycle-accurate simulations, showing that it can process packets at 400 Gbit/s for several use cases, introducing minimal latencies (26 ns for 64 B packets) and occupying a total area of 18.5 mm^2 (22 nm FDSOI).


page 3

page 4

page 5

page 6


HNLB: Utilizing Hardware Matching Capabilities of NICs for Offloading Stateful Load Balancers

In order to scale web or other services, the load on single instances of...

ICNLoWPAN -- Named-Data Networking for Low Power IoT Networks

Information Centric Networking is considered a promising communication t...

Power Saving Evaluation with Automatic Offloading

Heterogeneous hardware other than small-core CPU such as GPU, FPGA, or m...

Shufflecast: An Optical, Data-rate Agnostic and Low-Power Multicast Architecture for Next-Generation Compute Clusters

An optical circuit-switched network core has the potential to overcome t...

Enabling the Reflex Plane with the nanoPU

Many recent papers have demonstrated fast in-network computation using p...

Exploring the Vision Processing Unit as Co-processor for Inference

The success of the exascale supercomputer is largely debated to remain d...

Leveraging eBPF for programmable network functions with IPv6 Segment Routing

With the advent of Software Defined Networks (SDN), Network Function Vir...

Please sign up or login with your details

Forgot password? Click here to reset