Be Careful about Poisoned Word Embeddings: Exploring the Vulnerability of the Embedding Layers in NLP Models

03/29/2021
by   Wenkai Yang, et al.
2

Recent studies have revealed a security threat to natural language processing (NLP) models, called the Backdoor Attack. Victim models can maintain competitive performance on clean samples while behaving abnormally on samples with a specific trigger word inserted. Previous backdoor attacking methods usually assume that attackers have a certain degree of data knowledge, either the dataset which users would use or proxy datasets for a similar task, for implementing the data poisoning procedure. However, in this paper, we find that it is possible to hack the model in a data-free way by modifying one single word embedding vector, with almost no accuracy sacrificed on clean samples. Experimental results on sentiment analysis and sentence-pair classification tasks show that our method is more efficient and stealthier. We hope this work can raise the awareness of such a critical security risk hidden in the embedding layers of NLP models. Our code is available at https://github.com/lancopku/Embedding-Poisoning.

READ FULL TEXT
research
10/15/2021

RAP: Robustness-Aware Perturbations for Defending against Backdoor Attacks on NLP Models

Backdoor attacks, which maliciously control a well-trained model's outpu...
research
10/14/2022

Expose Backdoors on the Way: A Feature-Based Efficient Defense against Textual Backdoor Attacks

Natural language processing (NLP) models are known to be vulnerable to b...
research
06/11/2021

Turn the Combination Lock: Learnable Textual Backdoor Attacks via Word Substitution

Recent studies show that neural natural language processing (NLP) models...
research
06/21/2021

Membership Inference on Word Embedding and Beyond

In the text processing context, most ML models are built on word embeddi...
research
11/19/2020

Exploring Text Specific and Blackbox Fairness Algorithms in Multimodal Clinical NLP

Clinical machine learning is increasingly multimodal, collected in both ...
research
08/08/2023

XGBD: Explanation-Guided Graph Backdoor Detection

Backdoor attacks pose a significant security risk to graph learning mode...
research
09/12/2022

emojiSpace: Spatial Representation of Emojis

In the absence of nonverbal cues during messaging communication, users e...

Please sign up or login with your details

Forgot password? Click here to reset