Understanding Adversarial Robustness Against On-manifold Adversarial Examples

by   Jiancong Xiao, et al.
The Chinese University of Hong Kong, Shenzhen

Deep neural networks (DNNs) are shown to be vulnerable to adversarial examples. A well-trained model can be easily attacked by adding small perturbations to the original data. One of the hypotheses of the existence of the adversarial examples is the off-manifold assumption: adversarial examples lie off the data manifold. However, recent research showed that on-manifold adversarial examples also exist. In this paper, we revisit the off-manifold assumption and want to study a question: at what level is the poor performance of neural networks against adversarial attacks due to on-manifold adversarial examples? Since the true data manifold is unknown in practice, we consider two approximated on-manifold adversarial examples on both real and synthesis datasets. On real datasets, we show that on-manifold adversarial examples have greater attack rates than off-manifold adversarial examples on both standard-trained and adversarially-trained models. On synthetic datasets, theoretically, We prove that on-manifold adversarial examples are powerful, yet adversarial training focuses on off-manifold directions and ignores the on-manifold adversarial examples. Furthermore, we provide analysis to show that the properties derived theoretically can also be observed in practice. Our analysis suggests that on-manifold adversarial examples are important, and we should pay more attention to on-manifold adversarial examples for training robust models.


page 1

page 2

page 3

page 4


Disentangling Adversarial Robustness and Generalization

Obtaining deep networks that are robust against adversarial examples and...

The Dimpled Manifold Model of Adversarial Examples in Machine Learning

The extreme fragility of deep neural networks when presented with tiny p...

The Limitations of Adversarial Training and the Blind-Spot Attack

The adversarial training procedure proposed by Madry et al. (2018) is on...

Idealised Bayesian Neural Networks Cannot Have Adversarial Examples: Theoretical and Empirical Study

We prove that idealised discriminative Bayesian neural networks, capturi...

When adversarial examples are excusable

Neural networks work remarkably well in practice and theoretically they ...

Retrieval-Augmented Convolutional Neural Networks for Improved Robustness against Adversarial Examples

We propose a retrieval-augmented convolutional network and propose to tr...

Adversarial Examples Are Not Bugs, They Are Features

Adversarial examples have attracted significant attention in machine lea...

Please sign up or login with your details

Forgot password? Click here to reset