TransFool: An Adversarial Attack against Neural Machine Translation Models

02/02/2023
by   Sahar Sadrizadeh, et al.
1

Deep neural networks have been shown to be vulnerable to small perturbations of their inputs, known as adversarial attacks. In this paper, we investigate the vulnerability of Neural Machine Translation (NMT) models to adversarial attacks and propose a new attack algorithm called TransFool. To fool NMT models, TransFool builds on a multi-term optimization problem and a gradient projection step. By integrating the embedding representation of a language model, we generate fluent adversarial examples in the source language that maintain a high level of semantic similarity with the clean samples. Experimental results demonstrate that, for different translation tasks and NMT architectures, our white-box attack can severely degrade the translation quality while the semantic similarity between the original and the adversarial sentences stays high. Moreover, we show that TransFool is transferable to unknown target models. Finally, based on automatic and human evaluations, TransFool leads to improvement in terms of success rate, semantic similarity, and fluency compared to the existing attacks both in white-box and black-box settings. Thus, TransFool permits us to better characterize the vulnerability of NMT models and outlines the necessity to design strong defense mechanisms and more robust NMT systems for real-life applications.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/02/2023

Targeted Adversarial Attacks against Neural Machine Translation

Neural Machine Translation (NMT) systems are used in various application...
research
06/14/2023

A Relaxed Optimization Approach for Adversarial Attacks against Neural Machine Translation Models

In this paper, we propose an optimization-based adversarial attack again...
research
08/29/2023

A Classification-Guided Approach for Adversarial Attacks against Neural Machine Translation

Neural Machine Translation (NMT) models have been shown to be vulnerable...
research
06/27/2018

Gradient Similarity: An Explainable Approach to Detect Adversarial Attacks against Deep Learning

Deep neural networks are susceptible to small-but-specific adversarial p...
research
06/23/2018

On Adversarial Examples for Character-Level Neural Machine Translation

Evaluating on adversarial examples has become a standard procedure to me...
research
11/02/2020

Targeted Poisoning Attacks on Black-Box Neural Machine Translation

As modern neural machine translation (NMT) systems have been widely depl...
research
05/01/2021

Hidden Backdoors in Human-Centric Language Models

Natural language processing (NLP) systems have been proven to be vulnera...

Please sign up or login with your details

Forgot password? Click here to reset