Contrasting Human- and Machine-Generated Word-Level Adversarial Examples for Text Classification

09/09/2021
by   Maximilian Mozes, et al.
0

Research shows that natural language processing models are generally considered to be vulnerable to adversarial attacks; but recent work has drawn attention to the issue of validating these adversarial inputs against certain criteria (e.g., the preservation of semantics and grammaticality). Enforcing constraints to uphold such criteria may render attacks unsuccessful, raising the question of whether valid attacks are actually feasible. In this work, we investigate this through the lens of human language ability. We report on crowdsourcing studies in which we task humans with iteratively modifying words in an input text, while receiving immediate model feedback, with the aim of causing a sentiment classification model to misclassify the example. Our findings suggest that humans are capable of generating a substantial amount of adversarial examples using semantics-preserving word substitutions. We analyze how human-generated adversarial examples compare to the recently proposed TextFooler, Genetic, BAE and SememePSO attack algorithms on the dimensions naturalness, preservation of sentiment, grammaticality and substitution rate. Our findings suggest that human-generated adversarial examples are not more able than the best algorithms to generate natural-reading, sentiment-preserving examples, though they do so by being much more computationally efficient.

READ FULL TEXT
research
10/20/2022

Identifying Human Strategies for Generating Word-Level Adversarial Examples

Adversarial examples in NLP are receiving increasing research attention....
research
04/13/2020

Frequency-Guided Word Substitutions for Detecting Textual Adversarial Examples

While recent efforts have shown that neural text processing models are v...
research
04/25/2020

Reevaluating Adversarial Examples in Natural Language

State-of-the-art attacks on NLP models have different definitions of wha...
research
11/08/2022

Preserving Semantics in Textual Adversarial Attacks

Adversarial attacks in NLP challenge the way we look at language models....
research
12/01/2018

Discrete Attacks and Submodular Optimization with Applications to Text Classification

Adversarial examples are carefully constructed modifications to an input...
research
03/22/2021

Grey-box Adversarial Attack And Defence For Sentiment Classification

We introduce a grey-box adversarial attack and defence framework for sen...
research
06/01/2023

Constructing Semantics-Aware Adversarial Examples with Probabilistic Perspective

In this study, we introduce a novel, probabilistic viewpoint on adversar...

Please sign up or login with your details

Forgot password? Click here to reset