Quarantine: Sparsity Can Uncover the Trojan Attack Trigger for Free

by   Tianlong Chen, et al.

Trojan attacks threaten deep neural networks (DNNs) by poisoning them to behave normally on most samples, yet to produce manipulated results for inputs attached with a particular trigger. Several works attempt to detect whether a given DNN has been injected with a specific trigger during the training. In a parallel line of research, the lottery ticket hypothesis reveals the existence of sparse subnetworks which are capable of reaching competitive performance as the dense network after independent training. Connecting these two dots, we investigate the problem of Trojan DNN detection from the brand new lens of sparsity, even when no clean training data is available. Our crucial observation is that the Trojan features are significantly more stable to network pruning than benign features. Leveraging that, we propose a novel Trojan network detection regime: first locating a "winning Trojan lottery ticket" which preserves nearly full Trojan information yet only chance-level performance on clean inputs; then recovering the trigger embedded in this already isolated subnetwork. Extensive experiments on various datasets, i.e., CIFAR-10, CIFAR-100, and ImageNet, with different network architectures, i.e., VGG-16, ResNet-18, ResNet-20s, and DenseNet-100 demonstrate the effectiveness of our proposal. Codes are available at https://github.com/VITA-Group/Backdoor-LTH.


page 7

page 8


Practical Detection of Trojan Neural Networks: Data-Limited and Data-Free Cases

When the training data are maliciously tampered, the predictions of the ...

Efficient Lottery Ticket Finding: Less Data is More

The lottery ticket hypothesis (LTH) reveals the existence of winning tic...

You are caught stealing my winning lottery ticket! Making a lottery ticket claim its ownership

Despite tremendous success in many application scenarios, the training a...

Efficient Joint Optimization of Layer-Adaptive Weight Pruning in Deep Neural Networks

In this paper, we propose a novel layer-adaptive weight-pruning approach...

CUDA: Convolution-based Unlearnable Datasets

Large-scale training of modern deep learning models heavily relies on pu...

DHBE: Data-free Holistic Backdoor Erasing in Deep Neural Networks via Restricted Adversarial Distillation

Backdoor attacks have emerged as an urgent threat to Deep Neural Network...

OTOV2: Automatic, Generic, User-Friendly

The existing model compression methods via structured pruning typically ...

Please sign up or login with your details

Forgot password? Click here to reset