RAP: Robustness-Aware Perturbations for Defending against Backdoor Attacks on NLP Models

10/15/2021
by   Wenkai Yang, et al.
4

Backdoor attacks, which maliciously control a well-trained model's outputs of the instances with specific triggers, are recently shown to be serious threats to the safety of reusing deep neural networks (DNNs). In this work, we propose an efficient online defense mechanism based on robustness-aware perturbations. Specifically, by analyzing the backdoor training process, we point out that there exists a big gap of robustness between poisoned and clean samples. Motivated by this observation, we construct a word-based robustness-aware perturbation to distinguish poisoned samples from clean samples to defend against the backdoor attacks on natural language processing (NLP) models. Moreover, we give a theoretical analysis about the feasibility of our robustness-aware perturbation-based defense method. Experimental results on sentiment analysis and toxic detection tasks show that our method achieves better defending performance and much lower computational costs than existing online defense methods. Our code is available at https://github.com/lancopku/RAP.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/14/2022

Expose Backdoors on the Way: A Feature-Based Efficient Defense against Textual Backdoor Attacks

Natural language processing (NLP) models are known to be vulnerable to b...
research
04/08/2022

Backdoor Attack against NLP models with Robustness-Aware Perturbation defense

Backdoor attack intends to embed hidden backdoor into deep neural networ...
research
07/28/2021

Towards Robustness Against Natural Language Word Substitutions

Robustness against word substitutions has a well-defined and widely acce...
research
03/29/2021

Be Careful about Poisoned Word Embeddings: Exploring the Vulnerability of the Embedding Layers in NLP Models

Recent studies have revealed a security threat to natural language proce...
research
02/28/2020

Automatic Perturbation Analysis on General Computational Graphs

Linear relaxation based perturbation analysis for neural networks, which...
research
03/27/2021

Improving Model Robustness by Adaptively Correcting Perturbation Levels with Active Queries

In addition to high accuracy, robustness is becoming increasingly import...
research
01/25/2023

BDMMT: Backdoor Sample Detection for Language Models through Model Mutation Testing

Deep neural networks (DNNs) and natural language processing (NLP) system...

Please sign up or login with your details

Forgot password? Click here to reset