Assessing Robustness of Text Classification through Maximal Safe Radius Computation

10/01/2020
by   Emanuele La Malfa, et al.
11

Neural network NLP models are vulnerable to small modifications of the input that maintain the original meaning but result in a different prediction. In this paper, we focus on robustness of text classification against word substitutions, aiming to provide guarantees that the model prediction does not change if a word is replaced with a plausible alternative, such as a synonym. As a measure of robustness, we adopt the notion of the maximal safe radius for a given input text, which is the minimum distance in the embedding space to the decision boundary. Since computing the exact maximal safe radius is not feasible in practice, we instead approximate it by computing a lower and upper bound. For the upper bound computation, we employ Monte Carlo Tree Search in conjunction with syntactic filtering to analyse the effect of single and multiple word substitutions. The lower bound computation is achieved through an adaptation of the linear bounding techniques implemented in tools CNN-Cert and POPQORN, respectively for convolutional and recurrent network models. We evaluate the methods on sentiment analysis and news classification models for four datasets (IMDB, SST, AG News and NEWS) and a range of embeddings, and provide an analysis of robustness trends. We also apply our framework to interpretability analysis and compare it with LIME.

READ FULL TEXT
research
01/11/2022

Quantifying Robustness to Adversarial Word Substitutions

Deep-learning-based NLP models are found to be vulnerable to word substi...
research
07/10/2018

A Game-Based Approximate Verification of Deep Neural Networks with Provable Guarantees

Despite the improved accuracy of deep neural networks, the discovery of ...
research
07/31/2023

Text-CRS: A Generalized Certified Robustness Framework against Textual Adversarial Attacks

The language models, especially the basic text classification models, ha...
research
09/03/2019

Certified Robustness to Adversarial Word Substitutions

State-of-the-art NLP models can often be fooled by adversaries that appl...
research
03/04/2020

SeMemNN: A Semantic Matrix-Based Memory Neural Network for Text Classification

Text categorization is the task of assigning labels to documents written...
research
06/28/2019

Robustness Guarantees for Deep Neural Networks on Videos

The widespread adoption of deep learning models places demands on their ...
research
04/13/2021

Simpler Certified Radius Maximization by Propagating Covariances

One strategy for adversarially training a robust model is to maximize it...

Please sign up or login with your details

Forgot password? Click here to reset