Learning to Disentangle Robust and Vulnerable Features for Adversarial Detection

09/10/2019
by   Byunggill Joe, et al.
11

Although deep neural networks have shown promising performances on various tasks, even achieving human-level performance on some, they are shown to be susceptible to incorrect predictions even with imperceptibly small perturbations to an input. There exists a large number of previous works which proposed to defend against such adversarial attacks either by robust inference or detection of adversarial inputs. Yet, most of them cannot effectively defend against whitebox attacks where an adversary has a knowledge of the model and defense. More importantly, they do not provide a convincing reason why the generated adversarial inputs successfully fool the target models. To address these shortcomings of the existing approaches, we hypothesize that the adversarial inputs are tied to latent features that are susceptible to adversarial perturbation, which we call vulnerable features. Then based on this intuition, we propose a minimax game formulation to disentangle the latent features of each instance into robust and vulnerable ones, using variational autoencoders with two latent spaces. We thoroughly validate our model for both blackbox and whitebox attacks on MNIST, Fashion MNIST5, and Cat & Dog datasets, whose results show that the adversarial inputs cannot bypass our detector without changing its semantics, in which case the attack has failed.

READ FULL TEXT

page 4

page 8

page 13

page 14

research
12/07/2020

Learning to Separate Clusters of Adversarial Representations for Robust Adversarial Detection

Although deep neural networks have shown promising performances on vario...
research
08/12/2019

Adversarial Neural Pruning

It is well known that neural networks are susceptible to adversarial per...
research
06/01/2021

Improving the Adversarial Robustness for Speaker Verification by Self-Supervised Learning

Previous works have shown that automatic speaker verification (ASV) is s...
research
11/22/2019

Attack Agnostic Statistical Method for Adversarial Detection

Deep Learning based AI systems have shown great promise in various domai...
research
02/09/2022

Adversarial Detection without Model Information

Most prior state-of-the-art adversarial detection works assume that the ...
research
08/06/2019

MetaAdvDet: Towards Robust Detection of Evolving Adversarial Attacks

Deep neural networks (DNNs) are vulnerable to adversarial attack which i...
research
12/05/2019

Label-Consistent Backdoor Attacks

Deep neural networks have been demonstrated to be vulnerable to backdoor...

Please sign up or login with your details

Forgot password? Click here to reset