DeepFT: Fault-Tolerant Edge Computing using a Self-Supervised Deep Surrogate Model

12/02/2022
by   Shreshth Tuli, et al.
0

The emergence of latency-critical AI applications has been supported by the evolution of the edge computing paradigm. However, edge solutions are typically resource-constrained, posing reliability challenges due to heightened contention for compute and communication capacities and faulty application behavior in the presence of overload conditions. Although a large amount of generated log data can be mined for fault prediction, labeling this data for training is a manual process and thus a limiting factor for automation. Due to this, many companies resort to unsupervised fault-tolerance models. Yet, failure models of this kind can incur a loss of accuracy when they need to adapt to non-stationary workloads and diverse host characteristics. To cope with this, we propose a novel modeling approach, called DeepFT, to proactively avoid system overloads and their adverse effects by optimizing the task scheduling and migration decisions. DeepFT uses a deep surrogate model to accurately predict and diagnose faults in the system and co-simulation based self-supervised learning to dynamically adapt the model in volatile settings. It offers a highly scalable solution as the model size scales by only 3 and 1 percent per unit increase in the number of active tasks and hosts. Extensive experimentation on a Raspberry-Pi based edge cluster with DeFog benchmarks shows that DeepFT can outperform state-of-the-art baseline methods in fault-detection and QoS metrics. Specifically, DeepFT gives the highest F1 scores for fault-detection, reducing service deadline violations by up to 37% while also improving response time by up to 9

READ FULL TEXT

page 1

page 7

page 9

research
12/04/2021

PreGAN: Preemptive Migration Prediction Network for Proactive Fault-Tolerant Edge Computing

Building a fault-tolerant edge system that can quickly react to node ove...
research
08/16/2022

DRAGON: Decentralized Fault Tolerance in Edge Federations

Edge Federation is a new computing paradigm that seamlessly interconnect...
research
02/09/2023

Intelligent Proactive Fault Tolerance at the Edge through Resource Usage Prediction

The proliferation of demanding applications and edge computing establish...
research
07/10/2020

Self-healing Dilemmas in Distributed Systems: Fault-correction vs. Fault-tolerance

Large-scale decentralized systems of autonomous agents interacting via a...
research
02/15/2022

5G Enabled Fault Detection and Diagnostics: How Do We Achieve Efficiency?

The 5th-generation wireless networks (5G) technologies and mobile edge c...
research
07/04/2022

Oakestra white paper: An Orchestrator for Edge Computing

Edge computing seeks to enable applications with strict latency requirem...
research
12/16/2021

GOSH: Task Scheduling Using Deep Surrogate Models in Fog Computing Environments

Recently, intelligent scheduling approaches using surrogate models have ...

Please sign up or login with your details

Forgot password? Click here to reset