Phrase-level Textual Adversarial Attack with Label Preservation

05/22/2022
by   Yibin Lei, et al.
10

Generating high-quality textual adversarial examples is critical for investigating the pitfalls of natural language processing (NLP) models and further promoting their robustness. Existing attacks are usually realized through word-level or sentence-level perturbations, which either limit the perturbation space or sacrifice fluency and textual quality, both affecting the attack effectiveness. In this paper, we propose Phrase-Level Textual Adversarial aTtack (PLAT) that generates adversarial samples through phrase-level perturbations. PLAT first extracts the vulnerable phrases as attack targets by a syntactic parser, and then perturbs them by a pre-trained blank-infilling model. Such flexible perturbation design substantially expands the search space for more effective attacks without introducing too many modifications, and meanwhile maintaining the textual fluency and grammaticality via contextualized generation using surrounding texts. Moreover, we develop a label-preservation filter leveraging the likelihoods of language models fine-tuned on each class, rather than textual similarity, to rule out those perturbations that potentially alter the original class label for humans. Extensive experiments and human evaluation demonstrate that PLAT has a superior attack effectiveness as well as a better label consistency than strong baselines.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/16/2020

Contextualized Perturbation for Textual Adversarial Attack

Adversarial examples expose the vulnerabilities of natural language proc...
research
08/01/2023

LimeAttack: Local Explainable Method for Textual Hard-Label Adversarial Attack

Natural language processing models are vulnerable to adversarial example...
research
11/12/2022

Generating Textual Adversaries with Minimal Perturbation

Many word-level adversarial attack approaches for textual data have been...
research
04/29/2022

Detecting Textual Adversarial Examples Based on Distributional Characteristics of Data Representations

Although deep neural networks have achieved state-of-the-art performance...
research
08/17/2022

A Context-Aware Approach for Textual Adversarial Attack through Probability Difference Guided Beam Search

Textual adversarial attacks expose the vulnerabilities of text classifie...
research
09/06/2021

Efficient Combinatorial Optimization for Word-level Adversarial Textual Attack

Over the past few years, various word-level textual attack approaches ha...
research
01/20/2022

Learning-based Hybrid Local Search for the Hard-label Textual Attack

Deep neural networks are vulnerable to adversarial examples in Natural L...

Please sign up or login with your details

Forgot password? Click here to reset