The Limitations of Adversarial Training and the Blind-Spot Attack

01/15/2019
by   Huan Zhang, et al.
22

The adversarial training procedure proposed by Madry et al. (2018) is one of the most effective methods to defend against adversarial examples in deep neural networks (DNNs). In our paper, we shed some lights on the practicality and the hardness of adversarial training by showing that the effectiveness (robustness on test set) of adversarial training has a strong correlation with the distance between a test point and the manifold of training data embedded by the network. Test examples that are relatively far away from this manifold are more likely to be vulnerable to adversarial attacks. Consequentially, an adversarial training based defense is susceptible to a new class of attacks, the "blind-spot attack", where the input images reside in "blind-spots" (low density regions) of the empirical distribution of training data but is still on the ground-truth data manifold. For MNIST, we found that these blind-spots can be easily found by simply scaling and shifting image pixel values. Most importantly, for large datasets with high dimensional and complex data manifold (CIFAR, ImageNet, etc), the existence of blind-spots in adversarial training makes defending on any valid test examples difficult due to the curse of dimensionality and the scarcity of training data. Additionally, we find that blind-spots also exist on provable defenses including (Wong & Kolter, 2018) and (Sinha et al., 2018) because these trainable robustness certificates can only be practically optimized on a limited set of training data.

READ FULL TEXT

page 7

page 8

page 13

page 14

page 15

page 16

research
10/02/2022

Understanding Adversarial Robustness Against On-manifold Adversarial Examples

Deep neural networks (DNNs) are shown to be vulnerable to adversarial ex...
research
02/05/2018

Blind Pre-Processing: A Robust Defense Method Against Adversarial Examples

Deep learning algorithms and networks are vulnerable to perturbed inputs...
research
04/29/2019

Adversarial Training for Free!

Adversarial training, in which a network is trained on adversarial examp...
research
07/14/2022

Distance Learner: Incorporating Manifold Prior to Model Training

The manifold hypothesis (real world data concentrates near low-dimension...
research
05/08/2020

Blind Backdoors in Deep Learning Models

We investigate a new method for injecting backdoors into machine learnin...
research
03/10/2023

Do we need entire training data for adversarial training?

Deep Neural Networks (DNNs) are being used to solve a wide range of prob...
research
04/19/2023

Wavelets Beat Monkeys at Adversarial Robustness

Research on improving the robustness of neural networks to adversarial n...

Please sign up or login with your details

Forgot password? Click here to reset