Textual Backdoor Attacks with Iterative Trigger Injection

05/25/2022
by   Jun Yan, et al.
0

The backdoor attack has become an emerging threat for Natural Language Processing (NLP) systems. A victim model trained on poisoned data can be embedded with a "backdoor", making it predict the adversary-specified output (e.g., the positive sentiment label) on inputs satisfying the trigger pattern (e.g., containing a certain keyword). In this paper, we demonstrate that it's possible to design an effective and stealthy backdoor attack by iteratively injecting "triggers" into a small set of training data. While all triggers are common words that fit into the context, our poisoning process strongly associates them with the target label, forming the model backdoor. Experiments on sentiment analysis and hate speech detection show that our proposed attack is both stealthy and effective, raising alarm on the usage of untrusted training data. We further propose a defense method to combat this threat.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/26/2021

Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger

Backdoor attacks are a kind of insidious security threat against machine...
research
11/20/2020

ONION: A Simple and Effective Defense Against Textual Backdoor Attacks

Backdoor attacks, which are a kind of emergent training-time threat to d...
research
12/09/2021

Spinning Language Models for Propaganda-As-A-Service

We investigate a new threat to neural sequence-to-sequence (seq2seq) mod...
research
07/22/2021

Spinning Sequence-to-Sequence Models with Meta-Backdoors

We investigate a new threat to neural sequence-to-sequence (seq2seq) mod...
research
03/04/2022

Dynamic Backdoors with Global Average Pooling

Outsourced training and machine learning as a service have resulted in n...
research
07/11/2020

Mitigating backdoor attacks in LSTM-based Text Classification Systems by Backdoor Keyword Identification

It has been proved that deep neural networks are facing a new threat cal...
research
10/23/2020

Customizing Triggers with Concealed Data Poisoning

Adversarial attacks alter NLP model predictions by perturbing test-time ...

Please sign up or login with your details

Forgot password? Click here to reset