Universal Adversarial Attacks with Natural Triggers for Text Classification

05/01/2020
by   Liwei Song, et al.
0

Recent work has demonstrated the vulnerability of modern text classifiers to universal adversarial attacks, which are input-agnostic sequence of words added to any input instance. Despite being highly successful, the word sequences produced in these attacks are often unnatural, do not carry much semantic meaning, and can be easily distinguished from natural text. In this paper, we develop adversarial attacks that appear closer to natural English phrases and yet confuse classification systems when added to benign inputs. To achieve this, we leverage an adversarially regularized autoencoder (ARAE) to generate triggers and propose a gradient-based search method to output natural text that fools a target classifier. Experiments on two different classification tasks demonstrate the effectiveness of our attacks while also being less identifiable than previous approaches on three simple detection metrics.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/09/2019

On the Vulnerability of Capsule Networks to Adversarial Attacks

This paper extensively evaluates the vulnerability of capsule networks t...
research
09/25/2021

MINIMAL: Mining Models for Data Free Universal Adversarial Triggers

It is well known that natural language models are vulnerable to adversar...
research
03/09/2020

Gradient-based adversarial attacks on categorical sequence models via traversing an embedded world

An adversarial attack paradigm explores various scenarios for vulnerabil...
research
10/06/2020

Poison Attacks against Text Datasets with Conditional Adversarially Regularized Autoencoder

This paper demonstrates a fatal vulnerability in natural language infere...
research
01/22/2019

Universal Rules for Fooling Deep Neural Networks based Text Classification

Recently, deep learning based natural language processing techniques are...
research
11/28/2022

Attack on Unfair ToS Clause Detection: A Case Study using Universal Adversarial Triggers

Recent work has demonstrated that natural language processing techniques...
research
09/17/2020

Generating Label Cohesive and Well-Formed Adversarial Claims

Adversarial attacks reveal important vulnerabilities and flaws of traine...

Please sign up or login with your details

Forgot password? Click here to reset