Adversarial Examples for Extreme Multilabel Text Classification

by   Mohammadreza Qaraei, et al.

Extreme Multilabel Text Classification (XMTC) is a text classification problem in which, (i) the output space is extremely large, (ii) each data point may have multiple positive labels, and (iii) the data follows a strongly imbalanced distribution. With applications in recommendation systems and automatic tagging of web-scale documents, the research on XMTC has been focused on improving prediction accuracy and dealing with imbalanced data. However, the robustness of deep learning based XMTC models against adversarial examples has been largely underexplored. In this paper, we investigate the behaviour of XMTC models under adversarial attacks. To this end, first, we define adversarial attacks in multilabel text classification problems. We categorize attacking multilabel text classifiers as (a) positive-targeted, where the target positive label should fall out of top-k predicted labels, and (b) negative-targeted, where the target negative label should be among the top-k predicted labels. Then, by experiments on APLC-XLNet and AttentionXML, we show that XMTC models are highly vulnerable to positive-targeted attacks but more robust to negative-targeted ones. Furthermore, our experiments show that the success rate of positive-targeted adversarial attacks has an imbalanced distribution. More precisely, tail classes are highly vulnerable to adversarial attacks for which an attacker can generate adversarial samples with high similarity to the actual data-points. To overcome this problem, we explore the effect of rebalanced loss functions in XMTC where not only do they increase accuracy on tail classes, but they also improve the robustness of these classes against adversarial attacks. The code for our experiments is available at


page 1

page 2

page 3

page 4


Analyzing the Impact of Adversarial Examples on Explainable Machine Learning

Adversarial attacks are a type of attack on machine learning models wher...

TCAB: A Large-Scale Text Classification Attack Benchmark

We introduce the Text Classification Attack Benchmark (TCAB), a dataset ...

Beyond cross-entropy: learning highly separable feature distributions for robust and accurate classification

Deep learning has shown outstanding performance in several applications ...

Can Targeted Adversarial Examples Transfer When the Source and Target Models Have No Label Space Overlap?

We design blackbox transfer-based targeted adversarial attacks for an en...

Lost In Translation: Generating Adversarial Examples Robust to Round-Trip Translation

Language Models today provide a high accuracy across a large number of d...

Detection of Word Adversarial Examples in Text Classification: Benchmark and Baseline via Robust Density Estimation

Word-level adversarial attacks have shown success in NLP models, drastic...

Uncovering Why Deep Neural Networks Lack Robustness: Representation Metrics that Link to Adversarial Attacks

Neural networks have been shown vulnerable to adversarial samples. Sligh...

Please sign up or login with your details

Forgot password? Click here to reset