Poison as a Cure: Detecting Neutralizing Variable-Sized Backdoor Attacks in Deep Neural Networks

by   Alvin Chan, et al.

Deep learning models have recently shown to be vulnerable to backdoor poisoning, an insidious attack where the victim model predicts clean images correctly but classifies the same images as the target class when a trigger poison pattern is added. This poison pattern can be embedded in the training dataset by the adversary. Existing defenses are effective under certain conditions such as a small size of the poison pattern, knowledge about the ratio of poisoned training samples or when a validated clean dataset is available. Since a defender may not have such prior knowledge or resources, we propose a defense against backdoor poisoning that is effective even when those prerequisites are not met. It is made up of several parts: one to extract a backdoor poison signal, detect poison target and base classes, and filter out poisoned from clean samples with proven guarantees. The final part of our defense involves retraining the poisoned model on a dataset augmented with the extracted poison signal and corrective relabeling of poisoned samples to neutralize the backdoor. Our approach has shown to be effective in defending against backdoor attacks that use both small and large-sized poison patterns on nine different target-base class pairs from the CIFAR10 dataset.


page 3

page 5

page 13

page 19


Test-Time Detection of Backdoor Triggers for Poisoned Deep Neural Networks

Backdoor (Trojan) attacks are emerging threats against deep neural netwo...

L-RED: Efficient Post-Training Detection of Imperceptible Backdoor Attacks without Access to the Training Set

Backdoor attacks (BAs) are an emerging form of adversarial attack typica...

Semantic Host-free Trojan Attack

In this paper, we propose a novel host-free Trojan attack with triggers ...

Distilling Cognitive Backdoor Patterns within an Image

This paper proposes a simple method to distill and detect backdoor patte...

Trap and Replace: Defending Backdoor Attacks by Trapping Them into an Easy-to-Replace Subnetwork

Deep neural networks (DNNs) are vulnerable to backdoor attacks. Previous...

Confidence Matters: Inspecting Backdoors in Deep Neural Networks via Distribution Transfer

Backdoor attacks have been shown to be a serious security threat against...

BaDExpert: Extracting Backdoor Functionality for Accurate Backdoor Input Detection

We present a novel defense, against backdoor attacks on Deep Neural Netw...

Please sign up or login with your details

Forgot password? Click here to reset