Fundamental Tradeoffs between Invariance and Sensitivity to Adversarial Perturbations

02/11/2020
by   Florian Tramèr, et al.
0

Adversarial examples are malicious inputs crafted to induce misclassification. Commonly studied sensitivity-based adversarial examples introduce semantically-small changes to an input that result in a different model prediction. This paper studies a complementary failure mode, invariance-based adversarial examples, that introduce minimal semantic changes that modify an input's true label yet preserve the model's prediction. We demonstrate fundamental tradeoffs between these two types of adversarial examples. We show that defenses against sensitivity-based attacks actively harm a model's accuracy on invariance-based attacks, and that new approaches are needed to resist both attack types. In particular, we break state-of-the-art adversarially-trained and certifiably-robust models by generating small perturbations that the models are (provably) robust to, yet that change an input's class according to human labelers. Finally, we formally show that the existence of excessively invariant classifiers arises from the presence of overly-robust predictive features in standard datasets.

READ FULL TEXT

page 4

page 5

page 11

research
03/25/2019

Exploiting Excessive Invariance caused by Norm-Bounded Adversarial Robustness

Adversarial examples are malicious inputs crafted to cause a model to mi...
research
11/04/2022

Improving Adversarial Robustness to Sensitivity and Invariance Attacks with Deep Metric Learning

Intentionally crafted adversarial samples have effectively exploited wea...
research
09/27/2021

Classification and Adversarial examples in an Overparameterized Linear Model: A Signal Processing Perspective

State-of-the-art deep learning classifiers are heavily overparameterized...
research
11/17/2018

Classifiers Based on Deep Sparse Coding Architectures are Robust to Deep Learning Transferable Examples

Although deep learning has shown great success in recent years, research...
research
12/29/2020

With False Friends Like These, Who Can Have Self-Knowledge?

Adversarial examples arise from excessive sensitivity of a model. Common...
research
12/07/2021

Image classifiers can not be made robust to small perturbations

The sensitivity of image classifiers to small perturbations in the input...
research
02/09/2021

Adversarial Perturbations Are Not So Weird: Entanglement of Robust and Non-Robust Features in Neural Network Classifiers

Neural networks trained on visual data are well-known to be vulnerable t...

Please sign up or login with your details

Forgot password? Click here to reset