FASHION: Fault-Aware Self-Healing Intelligent On-chip Network

02/08/2017
by   Pengju Ren, et al.
0

To avoid packet loss and deadlock scenarios that arise due to faults or power gating in multicore and many-core systems, the network-on-chip needs to possess resilient communication and load-balancing properties. In this work, we introduce the Fashion router, a self-monitoring and self-reconfiguring design that allows for the on-chip network to dynamically adapt to component failures. First, we introduce a distributed intelligence unit, called Self-Awareness Module (SAM), which allows the router to detect permanent component failures and build a network connectivity map. Using local information, SAM adapts to faults, guarantees connectivity and deadlock-free routing inside the maximal connected subgraph and keeps routing tables up-to-date. Next, to reconfigure network links or virtual channels around faulty/power-gated components, we add bidirectional link and unified virtual channel structure features to the Fashion router. This version of the router, named Ex-Fashion, further mitigates the negative system performance impacts, leads to larger maximal connected subgraph and sustains a relatively high degree of fault-tolerance. To support the router, we develop a fault diagnosis and recovery algorithm executed by the Built-In Self-Test, self-monitoring, and self-reconfiguration units at runtime to provide fault-tolerant system functionalities. The Fashion router places no restriction on topology, position or number of faults. It drops 54.3-55.4 fewer nodes for same number of faults (between 30 and 60 faults) in an 8x8 2D-mesh over other state-of-the-art solutions. It is scalable and efficient. The area overheads are 2.311 2D-meshes using the TSMC 65nm library at 1.38GHz clock frequency.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro