HotFlip: White-Box Adversarial Examples for NLP

12/19/2017
by   Javid Ebrahimi, et al.
0

Adversarial examples expose vulnerabilities of machine learning models. We propose an efficient method to generate white-box adversarial examples that trick character-level and word-level neural models. Our method, HotFlip, relies on an atomic flip operation, which swaps one token for another, based on the gradients of the one-hot input vectors. In experiments on text classification and machine translation, we find that only a few manipulations are needed to greatly increase the error rates. We analyze the properties of these examples, and show that employing these adversarial examples in training can improve test-time accuracy on clean examples, as well as defend the models against adversarial examples.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset