Disrupting Adversarial Transferability in Deep Neural Networks

08/27/2021
by   Christopher Wiedeman, et al.
4

Adversarial attack transferability is a well-recognized phenomenon in deep learning. Prior work has partially explained transferability by recognizing common adversarial subspaces and correlations between decision boundaries, but we have found little explanation in the literature beyond this. In this paper, we propose that transferability between seemingly different models is due to a high linear correlation between features that different deep neural networks extract. In other words, two models trained on the same task that are seemingly distant in the parameter space likely extract features in the same fashion, just with trivial shifts and rotations between the latent spaces. Furthermore, we show how applying a feature correlation loss, which decorrelates the extracted features in a latent space, can drastically reduce the transferability of adversarial attacks between models, suggesting that the models complete tasks in semantically different ways. Finally, we propose a Dual Neck Autoencoder (DNA), which leverages this feature correlation loss to create two meaningfully different encodings of input information with reduced transferability.

READ FULL TEXT

page 8

page 10

page 11

page 12

page 13

page 18

research
05/16/2022

Transferability of Adversarial Attacks on Synthetic Speech Detection

Synthetic speech detection is one of the most important research problem...
research
06/18/2022

Comment on Transferability and Input Transformation with Additive Noise

Adversarial attacks have verified the existence of the vulnerability of ...
research
12/29/2021

Closer Look at the Transferability of Adversarial Examples: How They Fool Different Models Differently

Deep neural networks are vulnerable to adversarial examples (AEs), which...
research
04/05/2023

Going Further: Flatness at the Rescue of Early Stopping for Adversarial Example Transferability

Transferability is the property of adversarial examples to be misclassif...
research
08/11/2022

Diverse Generative Adversarial Perturbations on Attention Space for Transferable Adversarial Attacks

Adversarial attacks with improved transferability - the ability of an ad...
research
06/06/2023

Quantifying the Variability Collapse of Neural Networks

Recent studies empirically demonstrate the positive relationship between...
research
11/22/2021

NTD: Non-Transferability Enabled Backdoor Detection

A backdoor deep learning (DL) model behaves normally upon clean inputs b...

Please sign up or login with your details

Forgot password? Click here to reset