Reliable and Distributed Network Monitoring via In-band Network Telemetry

by   Goksel Simsek, et al.

Traditional network monitoring solutions usually lack of scalability due to their centralized nature collecting heartbeats from all network components via a single controller. As a solution, In-Band Network Telemetry (INT) framework has been recently proposed to collect network telemetry information more autonomously and distributedly by employing programmable switches. However, it imposes further challenges to (i) find suitable INT paths to optimize the control overhead and information freshness and (ii) ensure reliable delivery of control information over multi-hop INT paths. In this work, we propose a monitoring scheme, reliable Graph Partitioned INT (GPINT), by extending our previous work and integrating shared queue ring (SQR) as a reliability feature against potential failures in network telemetry collection due to network congestion and link degradation that may cause loss of the visibility of the network. We implement our proposal in a recent data plane programming language P4, and compare it with traditional Simple Network Management Protocol (SNMP) and also another state-of-the-art study employing Euler's method for INT path generation. Our analysis first shows the importance of having a data recovery mechanism against packet losses under different network conditions. Then, our emulation results indicate that GPINT with reliability extension performs much better than its opponent in terms of telemetry collection latency and overhead monitoring scheme even under a high amount of packet losses.


Reliability Aware Multiple Path Installation in Software Defined Networking

Being a state-of-the-art network, Software Defined Networking (SDN) deco...

FastReact: In-Network Control and Caching for Industrial Control Networks using Programmable Data Planes

Providing network reliability as well as low and predictable latency is ...

Programmable Event Detection for In-Band Network Telemetry

In-Band Network Telemetry (INT) is a novel framework for collecting tele...

NetReduce: RDMA-Compatible In-Network Reduction for Distributed DNN Training Acceleration

We present NetReduce, a novel RDMA-compatible in-network reduction archi...

P4-CoDel: Experiences on Programmable Data Plane Hardware

Fixed buffer sizing in computer networks, especially the Internet, is a ...

P4TE: PISA Switch Based Traffic Engineering in Fat-Tree Data Center Networks

This work presents P4TE, an in-band traffic monitoring, load-aware packe...

FlEC: Enhancing QUIC with application-tailored reliability mechanisms

Packet losses are common events in today's networks. They usually result...

Please sign up or login with your details

Forgot password? Click here to reset