BAE: BERT-based Adversarial Examples for Text Classification

04/04/2020
by   Siddhant Garg, et al.
0

Modern text classification models are susceptible to adversarial examples, perturbed versions of the original text indiscernible by humans but which get misclassified by the model. We present BAE, a powerful black box attack for generating grammatically correct and semantically coherent adversarial examples. BAE replaces and inserts tokens in the original text by masking a portion of the text and leveraging a language model to generate alternatives for the masked tokens. Compared to prior work, we show that BAE performs a stronger attack on three widely used models for seven text classification datasets.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset