Model Agnostic Defence against Backdoor Attacks in Machine Learning

by   Sakshi Udeshi, et al.

Machine Learning (ML) has automated a multitude of our day-to-day decision making domains such as education, employment and driving automation. The continued success of ML largely depends on our ability to trust the model we are using. Recently, a new class of attacks called Backdoor Attacks have been developed. These attacks undermine the user's trust in ML models. In this work, we present NEO, a model agnostic framework to detect and mitigate such backdoor attacks in image classification ML models. For a given image classification model, our approach analyses the inputs it receives and determines if the model is backdoored. In addition to this feature, we also mitigate these attacks by determining the correct predictions of the poisoned images. An appealing feature of NEO is that it can, for the first time, isolate and reconstruct the backdoor trigger. NEO is also the first defence methodology, to the best of our knowledge that is completely blackbox. We have implemented NEO and evaluated it against three state of the art poisoned models. These models include highly critical applications such as traffic sign detection (USTS) and facial detection. In our evaluation, we show that NEO can detect ≈88% of the poisoned inputs on average and it is as fast as 4.4 ms per input image. We also reconstruct the poisoned input for the user to effectively test their systems.


page 1

page 3

page 10


EG-Booster: Explanation-Guided Booster of ML Evasion Attacks

The widespread usage of machine learning (ML) in a myriad of domains has...

BagFlip: A Certified Defense against Data Poisoning

Machine learning models are vulnerable to data-poisoning attacks, in whi...

From Zero-Shot Machine Learning to Zero-Day Attack Detection

The standard ML methodology assumes that the test samples are derived fr...

Automatic Generation of Attention Rules For Containment of Machine Learning Model Errors

Machine learning (ML) solutions are prevalent in many applications. Howe...

NTD: Non-Transferability Enabled Backdoor Detection

A backdoor deep learning (DL) model behaves normally upon clean inputs b...

AdvCat: Domain-Agnostic Robustness Assessment for Cybersecurity-Critical Applications with Categorical Inputs

Machine Learning-as-a-Service systems (MLaaS) have been largely develope...

Customized Video QoE Estimation with Algorithm-Agnostic Transfer Learning

The development of QoE models by means of Machine Learning (ML) is chall...

Please sign up or login with your details

Forgot password? Click here to reset