Adversarial Perturbations Are Not So Weird: Entanglement of Robust and Non-Robust Features in Neural Network Classifiers

02/09/2021
by   Jacob M. Springer, et al.
8

Neural networks trained on visual data are well-known to be vulnerable to often imperceptible adversarial perturbations. The reasons for this vulnerability are still being debated in the literature. Recently Ilyas et al. (2019) showed that this vulnerability arises, in part, because neural network classifiers rely on highly predictive but brittle "non-robust" features. In this paper we extend the work of Ilyas et al. by investigating the nature of the input patterns that give rise to these features. In particular, we hypothesize that in a neural network trained in a standard way, non-robust features respond to small, "non-semantic" patterns that are typically entangled with larger, robust patterns, known to be more human-interpretable, as opposed to solely responding to statistical artifacts in a dataset. Thus, adversarial examples can be formed via minimal perturbations to these small, entangled patterns. In addition, we demonstrate a corollary of our hypothesis: robust classifiers are more effective than standard (non-robust) ones as a source for generating transferable adversarial examples in both the untargeted and targeted settings. The results we present in this paper provide new insight into the nature of the non-robust features responsible for adversarial vulnerability of neural network classifiers.

READ FULL TEXT

page 2

page 3

page 18

page 19

page 20

research
06/03/2021

A Little Robustness Goes a Long Way: Leveraging Universal Features for Targeted Transfer Attacks

Adversarial examples for neural network image classifiers are known to b...
research
03/01/2023

Adversarial Examples Exist in Two-Layer ReLU Networks for Low Dimensional Data Manifolds

Despite a great deal of research, it is still not well-understood why tr...
research
06/19/2020

Using Learning Dynamics to Explore the Role of Implicit Regularization in Adversarial Examples

Recent work (Ilyas et al, 2019) suggests that adversarial examples are f...
research
02/11/2020

Fundamental Tradeoffs between Invariance and Sensitivity to Adversarial Perturbations

Adversarial examples are malicious inputs crafted to induce misclassific...
research
05/30/2018

There Is No Free Lunch In Adversarial Robustness (But There Are Unexpected Benefits)

We provide a new understanding of the fundamental nature of adversariall...
research
11/12/2021

Neural Population Geometry Reveals the Role of Stochasticity in Robust Perception

Adversarial examples are often cited by neuroscientists and machine lear...
research
09/27/2021

Classification and Adversarial examples in an Overparameterized Linear Model: A Signal Processing Perspective

State-of-the-art deep learning classifiers are heavily overparameterized...

Please sign up or login with your details

Forgot password? Click here to reset