Natural Language Adversarial Attacks and Defenses in Word Level

by   Xiaosen Wang, et al.

Up until recent two years, inspired by the big amount of research about adversarial example in the field of computer vision, there has been a growing interest in adversarial attacks for Natural Language Processing (NLP). What followed was a very few works of adversarial defense for NLP. However, there exists no defense method against the successful synonyms substitution based attacks that aim to satisfy all the lexical, grammatical, semantic constraints and thus are hard to perceived by humans. To fill this gap, we postulate the generalization of the model leads to the existence of adversarial examples, and propose an adversarial defense method called Synonyms Encoding Method (SEM), which inserts an encoder before the input layer of the model and then trains the model to eliminate adversarial perturbations. Extensive experiments demonstrate that SEM can efficiently defend current best synonym substitution based adversarial attacks with almost no decay on the accuracy for benign examples. Besides, to better evaluate SEM, we also propose a strong attack method called Improved Genetic Algorithm (IGA) that adopts the genetic metaheuristic against synonyms substitution based attacks. Compared with existing genetic based adversarial attack, the proposed IGA can achieve higher attack success rate at the same time maintain the transferability of adversarial examples.


page 1

page 2

page 3

page 4


Defense of Word-level Adversarial Attacks via Random Substitution Encoding

The adversarial attacks against deep neural networks on computer version...

"That Is a Suspicious Reaction!": Interpreting Logits Variation to Detect NLP Adversarial Attacks

Adversarial attacks are a major challenge faced by current machine learn...

Adversarial Gain

Adversarial examples can be defined as inputs to a model which induce a ...

A Survey in Adversarial Defences and Robustness in NLP

In recent years, it has been seen that deep neural networks are lacking ...

Textual Manifold-based Defense Against Natural Language Adversarial Examples

Recent studies on adversarial images have shown that they tend to leave ...

ADDMU: Detection of Far-Boundary Adversarial Examples with Data and Model Uncertainty Estimation

Adversarial Examples Detection (AED) is a crucial defense technique agai...

Token-Modification Adversarial Attacks for Natural Language Processing: A Survey

There are now many adversarial attacks for natural language processing s...

Please sign up or login with your details

Forgot password? Click here to reset