Reverse Engineering Imperceptible Backdoor Attacks on Deep Neural Networks for Detection and Training Set Cleansing

10/15/2020
by   Zhen Xiang, et al.
0

Backdoor data poisoning is an emerging form of adversarial attack usually against deep neural network image classifiers. The attacker poisons the training set with a relatively small set of images from one (or several) source class(es), embedded with a backdoor pattern and labeled to a target class. For a successful attack, during operation, the trained classifier will: 1) misclassify a test image from the source class(es) to the target class whenever the same backdoor pattern is present; 2) maintain a high classification accuracy for backdoor-free test images. In this paper, we make a break-through in defending backdoor attacks with imperceptible backdoor patterns (e.g. watermarks) before/during the training phase. This is a challenging problem because it is a priori unknown which subset (if any) of the training set has been poisoned. We propose an optimization-based reverse-engineering defense, that jointly: 1) detects whether the training set is poisoned; 2) if so, identifies the target class and the training images with the backdoor pattern embedded; and 3) additionally, reversely engineers an estimate of the backdoor pattern used by the attacker. In benchmark experiments on CIFAR-10, for a large variety of attacks, our defense achieves a new state-of-the-art by reducing the attack success rate to no more than 4.9 training images.

READ FULL TEXT

page 6

page 12

page 18

page 19

page 21

page 22

page 25

page 27

research
10/20/2020

L-RED: Efficient Post-Training Detection of Imperceptible Backdoor Attacks without Access to the Training Set

Backdoor attacks (BAs) are an emerging form of adversarial attack typica...
research
12/06/2021

Test-Time Detection of Backdoor Triggers for Poisoned Deep Neural Networks

Backdoor (Trojan) attacks are emerging threats against deep neural netwo...
research
10/19/2022

Training set cleansing of backdoor poisoning by self-supervised representation learning

A backdoor or Trojan attack is an important type of data poisoning attac...
research
04/03/2018

Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks

Data poisoning is a type of adversarial attack on machine learning model...
research
05/29/2023

UMD: Unsupervised Model Detection for X2X Backdoor Attacks

Backdoor (Trojan) attack is a common threat to deep neural networks, whe...
research
04/12/2021

A Backdoor Attack against 3D Point Cloud Classifiers

Vulnerability of 3D point cloud (PC) classifiers has become a grave conc...
research
11/20/2022

Invisible Backdoor Attack with Dynamic Triggers against Person Re-identification

In recent years, person Re-identification (ReID) has rapidly progressed ...

Please sign up or login with your details

Forgot password? Click here to reset