DI-AA: An Interpretable White-box Attack for Fooling Deep Neural Networks

10/14/2021
by   Yixiang Wang, et al.
0

White-box Adversarial Example (AE) attacks towards Deep Neural Networks (DNNs) have a more powerful destructive capacity than black-box AE attacks in the fields of AE strategies. However, almost all the white-box approaches lack interpretation from the point of view of DNNs. That is, adversaries did not investigate the attacks from the perspective of interpretable features, and few of these approaches considered what features the DNN actually learns. In this paper, we propose an interpretable white-box AE attack approach, DI-AA, which explores the application of the interpretable approach of the deep Taylor decomposition in the selection of the most contributing features and adopts the Lagrangian relaxation optimization of the logit output and L_p norm to further decrease the perturbation. We compare DI-AA with six baseline attacks (including the state-of-the-art attack AutoAttack) on three datasets. Experimental results reveal that our proposed approach can 1) attack non-robust models with comparatively low perturbation, where the perturbation is closer to or lower than the AutoAttack approach; 2) break the TRADES adversarial training models with the highest success rate; 3) the generated AE can reduce the robust accuracy of the robust black-box models by 16 attack.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/16/2017

Enhanced Attacks on Defensively Distilled Deep Neural Networks

Deep neural networks (DNNs) have achieved tremendous success in many tas...
research
05/01/2019

NATTACK: Learning the Distributions of Adversarial Examples for an Improved Black-Box Attack on Deep Neural Networks

Powerful adversarial attack methods are vital for understanding how to c...
research
02/17/2020

Scalable Quantitative Verification For Deep Neural Networks

Verifying security properties of deep neural networks (DNNs) is becoming...
research
10/23/2020

Learn Robust Features via Orthogonal Multi-Path

It is now widely known that by adversarial attacks, clean images with in...
research
05/11/2018

Breaking Transferability of Adversarial Samples with Randomness

We investigate the role of transferability of adversarial attacks in the...
research
12/03/2020

Essential Features: Reducing the Attack Surface of Adversarial Perturbations with Robust Content-Aware Image Preprocessing

Adversaries are capable of adding perturbations to an image to fool mach...
research
01/01/2022

Rethinking Feature Uncertainty in Stochastic Neural Networks for Adversarial Robustness

It is well-known that deep neural networks (DNNs) have shown remarkable ...

Please sign up or login with your details

Forgot password? Click here to reset