Distilling Cognitive Backdoor Patterns within an Image

01/26/2023
by   Hanxun Huang, et al.
0

This paper proposes a simple method to distill and detect backdoor patterns within an image: Cognitive Distillation (CD). The idea is to extract the "minimal essence" from an input image responsible for the model's prediction. CD optimizes an input mask to extract a small pattern from the input image that can lead to the same model output (i.e., logits or deep features). The extracted pattern can help understand the cognitive mechanism of a model on clean vs. backdoor images and is thus called a Cognitive Pattern (CP). Using CD and the distilled CPs, we uncover an interesting phenomenon of backdoor attacks: despite the various forms and sizes of trigger patterns used by different attacks, the CPs of backdoor samples are all surprisingly and suspiciously small. One thus can leverage the learned mask to detect and remove backdoor examples from poisoned training datasets. We conduct extensive experiments to show that CD can robustly detect a wide range of advanced backdoor attacks. We also show that CD can potentially be applied to help detect potential biases from face datasets. Code is available at <https://github.com/HanxunH/CognitiveDistillation>.

READ FULL TEXT

page 4

page 19

page 21

page 22

page 29

page 30

page 31

research
07/28/2023

Beating Backdoor Attack at Its Own Game

Deep neural networks (DNNs) are vulnerable to backdoor attack, which doe...
research
03/13/2023

Backdoor Defense via Deconfounded Representation Learning

Deep neural networks (DNNs) are recently shown to be vulnerable to backd...
research
11/19/2019

Poison as a Cure: Detecting Neutralizing Variable-Sized Backdoor Attacks in Deep Neural Networks

Deep learning models have recently shown to be vulnerable to backdoor po...
research
07/17/2023

Towards Stealthy Backdoor Attacks against Speech Recognition via Elements of Sound

Deep neural networks (DNNs) have been widely and successfully adopted an...
research
03/27/2023

Mask and Restore: Blind Backdoor Defense at Test Time with Masked Autoencoder

Deep neural networks are vulnerable to backdoor attacks, where an advers...
research
06/13/2023

DHBE: Data-free Holistic Backdoor Erasing in Deep Neural Networks via Restricted Adversarial Distillation

Backdoor attacks have emerged as an urgent threat to Deep Neural Network...
research
07/26/2022

Visually explaining 3D-CNN predictions for video classification with an adaptive occlusion sensitivity analysis

This paper proposes a method for visually explaining the decision-making...

Please sign up or login with your details

Forgot password? Click here to reset