DSRM: Boost Textual Adversarial Training with Distribution Shift Risk Minimization

06/27/2023
by   Songyang Gao, et al.
0

Adversarial training is one of the best-performing methods in improving the robustness of deep language models. However, robust models come at the cost of high time consumption, as they require multi-step gradient ascents or word substitutions to obtain adversarial samples. In addition, these generated samples are deficient in grammatical quality and semantic consistency, which impairs the effectiveness of adversarial training. To address these problems, we introduce a novel, effective procedure for instead adversarial training with only clean data. Our procedure, distribution shift risk minimization (DSRM), estimates the adversarial loss by perturbing the input data's probability distribution rather than their embeddings. This formulation results in a robust model that minimizes the expected global loss under adversarial attacks. Our approach requires zero adversarial samples for training and reduces time consumption by up to 70% compared to current best-performing adversarial training methods. Experiments demonstrate that DSRM considerably improves BERT's resistance to textual adversarial attacks and achieves state-of-the-art robust accuracy on various benchmarks.

READ FULL TEXT
research
03/20/2020

Adversarial Robustness on In- and Out-Distribution Improves Explainability

Neural networks have led to major improvements in image classification b...
research
09/25/2019

FreeLB: Enhanced Adversarial Training for Language Understanding

Adversarial training, which minimizes the maximal risk for label-preserv...
research
09/10/2023

Outlier Robust Adversarial Training

Supervised learning models are challenged by the intrinsic complexities ...
research
04/09/2016

A General Retraining Framework for Scalable Adversarial Classification

Traditional classification algorithms assume that training and test data...
research
02/03/2020

Regularizers for Single-step Adversarial Training

The progress in the last decade has enabled machine learning models to a...
research
05/30/2023

It begins with a boundary: A geometric view on probabilistically robust learning

Although deep neural networks have achieved super-human performance on m...
research
10/13/2022

AccelAT: A Framework for Accelerating the Adversarial Training of Deep Neural Networks through Accuracy Gradient

Adversarial training is exploited to develop a robust Deep Neural Networ...

Please sign up or login with your details

Forgot password? Click here to reset