Ensemble Adversarial Training: Attacks and Defenses

by   Florian Tramèr, et al.

Machine learning models are vulnerable to adversarial examples, inputs maliciously perturbed to mislead the model. These inputs transfer between models, thus enabling black-box attacks against deployed models. Adversarial training increases robustness to attacks by injecting adversarial examples into training data. Surprisingly, we find that although adversarially trained models exhibit strong robustness to some white-box attacks (i.e., with knowledge of the model parameters), they remain highly vulnerable to transferred adversarial examples crafted on other models. We show that the reason for this vulnerability is the model's decision surface exhibiting sharp curvature in the vicinity of the data points, thus hindering attacks based on first-order approximations of the model's loss, but permitting black-box attacks that use adversarial examples transferred from another model. We harness this observation in two ways: First, we propose a simple yet powerful novel attack that first applies a small random perturbation to an input, before finding the optimal perturbation under a first-order approximation. Our attack outperforms prior "single-step" attacks on models trained with or without adversarial training. Second, we propose Ensemble Adversarial Training, an extension of adversarial training that additionally augments training data with perturbed inputs transferred from a number of fixed pre-trained models. On MNIST and ImageNet, ensemble adversarial training vastly improves robustness to black-box attacks.


Adversarial Machine Learning at Scale

Adversarial examples are malicious inputs designed to fool machine learn...

Purifying Adversarial Perturbation with Adversarially Trained Auto-encoders

Machine learning models are vulnerable to adversarial examples. Iterativ...

Robust or Private? Adversarial Training Makes Models More Vulnerable to Privacy Attacks

Adversarial training was introduced as a way to improve the robustness o...

Adversarial Attacks are a Surprisingly Strong Baseline for Poisoning Few-Shot Meta-Learners

This paper examines the robustness of deployed few-shot meta-learning sy...

Switching One-Versus-the-Rest Loss to Increase the Margin of Logits for Adversarial Robustness

Defending deep neural networks against adversarial examples is a key cha...

Strength in Numbers: Trading-off Robustness and Computation via Adversarially-Trained Ensembles

While deep learning has led to remarkable results on a number of challen...

Cascade Adversarial Machine Learning Regularized with a Unified Embedding

Deep neural network classifiers are vulnerable to small input perturbati...

Please sign up or login with your details

Forgot password? Click here to reset