Beating Attackers At Their Own Games: Adversarial Example Detection Using Adversarial Gradient Directions

12/31/2020
by   Yuhang Wu, et al.
17

Adversarial examples are input examples that are specifically crafted to deceive machine learning classifiers. State-of-the-art adversarial example detection methods characterize an input example as adversarial either by quantifying the magnitude of feature variations under multiple perturbations or by measuring its distance from estimated benign example distribution. Instead of using such metrics, the proposed method is based on the observation that the directions of adversarial gradients when crafting (new) adversarial examples play a key role in characterizing the adversarial space. Compared to detection methods that use multiple perturbations, the proposed method is efficient as it only applies a single random perturbation on the input example. Experiments conducted on two different databases, CIFAR-10 and ImageNet, show that the proposed detection method achieves, respectively, 97.9 average) on five different adversarial attacks, and outperforms multiple state-of-the-art detection methods. Results demonstrate the effectiveness of using adversarial gradient directions for adversarial example detection.

READ FULL TEXT
research
03/25/2023

AdvCheck: Characterizing Adversarial Examples via Local Gradient Checking

Deep neural networks (DNNs) are vulnerable to adversarial examples, whic...
research
03/08/2021

Enhancing Transformation-based Defenses against Adversarial Examples with First-Order Perturbations

Studies show that neural networks are susceptible to adversarial attacks...
research
06/08/2019

ML-LOO: Detecting Adversarial Examples with Feature Attribution

Deep neural networks obtain state-of-the-art performance on a series of ...
research
12/19/2019

Mitigating large adversarial perturbations on X-MAS (X minus Moving Averaged Samples)

We propose the scheme that mitigates an adversarial perturbation ϵ on th...
research
09/12/2022

Adaptive Perturbation Generation for Multiple Backdoors Detection

Extensive evidence has demonstrated that deep neural networks (DNNs) are...
research
10/26/2021

Frequency Centric Defense Mechanisms against Adversarial Examples

Adversarial example (AE) aims at fooling a Convolution Neural Network by...
research
05/08/2021

Self-Supervised Adversarial Example Detection by Disentangled Representation

Deep learning models are known to be vulnerable to adversarial examples ...

Please sign up or login with your details

Forgot password? Click here to reset