UMD: Unsupervised Model Detection for X2X Backdoor Attacks

by   Zhen Xiang, et al.

Backdoor (Trojan) attack is a common threat to deep neural networks, where samples from one or more source classes embedded with a backdoor trigger will be misclassified to adversarial target classes. Existing methods for detecting whether a classifier is backdoor attacked are mostly designed for attacks with a single adversarial target (e.g., all-to-one attack). To the best of our knowledge, without supervision, no existing methods can effectively address the more general X2X attack with an arbitrary number of source classes, each paired with an arbitrary target class. In this paper, we propose UMD, the first Unsupervised Model Detection method that effectively detects X2X backdoor attacks via a joint inference of the adversarial (source, target) class pairs. In particular, we first define a novel transferability statistic to measure and select a subset of putative backdoor class pairs based on a proposed clustering approach. Then, these selected class pairs are jointly assessed based on an aggregation of their reverse-engineered trigger size for detection inference, using a robust and unsupervised anomaly detector we proposed. We conduct comprehensive evaluations on CIFAR-10, GTSRB, and Imagenette dataset, and show that our unsupervised UMD outperforms SOTA detectors (even with supervision) by 17 diverse X2X attacks. We also show the strong detection performance of UMD against several strong adaptive attacks.


page 1

page 19

page 27


Reverse Engineering Imperceptible Backdoor Attacks on Deep Neural Networks for Detection and Training Set Cleansing

Backdoor data poisoning is an emerging form of adversarial attack usuall...

Defense Against Multi-target Trojan Attacks

Adversarial attacks on deep learning-based models pose a significant thr...

Universal Post-Training Backdoor Detection

A Backdoor attack (BA) is an important type of adversarial attack agains...

Post-Training Detection of Backdoor Attacks for Two-Class and Multi-Attack Scenarios

Backdoor attacks (BAs) are an emerging threat to deep neural network cla...

Odyssey: Creation, Analysis and Detection of Trojan Models

Along with the success of deep neural network (DNN) models in solving va...

Demon in the Variant: Statistical Analysis of DNNs for Robust Backdoor Contamination Detection

A security threat to deep neural networks (DNN) is backdoor contaminatio...

Group-based Robustness: A General Framework for Customized Robustness in the Real World

Machine-learning models are known to be vulnerable to evasion attacks th...

Please sign up or login with your details

Forgot password? Click here to reset