Balanced Adversarial Training: Balancing Tradeoffs between Fickleness and Obstinacy in NLP Models

10/20/2022
by   Hannah Chen, et al.
7

Traditional (fickle) adversarial examples involve finding a small perturbation that does not change an input's true label but confuses the classifier into outputting a different prediction. Conversely, obstinate adversarial examples occur when an adversary finds a small perturbation that preserves the classifier's prediction but changes the true label of an input. Adversarial training and certified robust training have shown some effectiveness in improving the robustness of machine learnt models to fickle adversarial examples. We show that standard adversarial training methods focused on reducing vulnerability to fickle adversarial examples may make a model more vulnerable to obstinate adversarial examples, with experiments for both natural language inference and paraphrase identification tasks. To counter this phenomenon, we introduce Balanced Adversarial Training, which incorporates contrastive learning to increase robustness against both fickle and obstinate adversarial examples.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/16/2023

On the Effect of Adversarial Training Against Invariance-based Adversarial Examples

Adversarial examples are carefully crafted attack points that are suppos...
research
09/05/2019

Adversarial Examples with Difficult Common Words for Paraphrase Identification

Despite the success of deep models for paraphrase identification on benc...
research
12/16/2021

Towards Robust Neural Image Compression: Adversarial Attack and Model Finetuning

Deep neural network based image compression has been extensively studied...
research
03/24/2023

How many dimensions are required to find an adversarial example?

Past work exploring adversarial vulnerability have focused on situations...
research
10/30/2018

Improved Network Robustness with Adversary Critic

Ideally, what confuses neural network should be confusing to humans. How...
research
09/01/2021

Towards Improving Adversarial Training of NLP Models

Adversarial training, a method for learning robust deep neural networks,...
research
10/01/2021

Calibrated Adversarial Training

Adversarial training is an approach of increasing the robustness of mode...

Please sign up or login with your details

Forgot password? Click here to reset