Cascade Adversarial Machine Learning Regularized with a Unified Embedding

by   Taesik Na, et al.

Deep neural network classifiers are vulnerable to small input perturbations carefully generated by the adversaries. Injecting adversarial inputs during training, known as adversarial training, can improve robustness against one-step attacks, but not for unknown iterative attacks. To address this challenge, we propose to utilize embedding space for both classification and low-level (pixel-level) similarity learning to ignore unknown pixel level perturbation. During training, we inject adversarial images without replacing their corresponding clean images and penalize the distance between the two embeddings (clean and adversarial). This additional regularization encourages two similar images (clean and perturbed versions) to produce the same outputs, not necessarily the true labels, enhancing classifier's robustness against pixel level perturbation. Next, we show iteratively generated adversarial images easily transfer between networks trained with the same strategy. Inspired by this observation, we also propose cascade adversarial training, which transfers the knowledge of the end results of adversarial training. We train a network from scratch by injecting iteratively generated adversarial images crafted from already defended networks in addition to one-step adversarial images from the network being trained. Experimental results show that cascade adversarial training together with our proposed low-level similarity learning efficiently enhance the robustness against iterative attacks, but at the expense of decreased robustness against one-step attacks. We show that combining those two techniques can also improve robustness under the worst case black box attack scenario.


page 1

page 2

page 3

page 4


Gray-box Adversarial Training

Adversarial samples are perturbed inputs crafted to mislead the machine ...

Ensemble Adversarial Training: Attacks and Defenses

Machine learning models are vulnerable to adversarial examples, inputs m...

Combating Adversaries with Anti-Adversaries

Deep neural networks are vulnerable to small input perturbations known a...

A Perturbation Resistant Transformation and Classification System for Deep Neural Networks

Deep convolutional neural networks accurately classify a diverse range o...

Robustness and Generalization via Generative Adversarial Training

While deep neural networks have achieved remarkable success in various c...

CAP-GAN: Towards Adversarial Robustness with Cycle-consistent Attentional Purification

Adversarial attack is aimed at fooling the target classifier with imperc...

Enhancing Intrinsic Adversarial Robustness via Feature Pyramid Decoder

Whereas adversarial training is employed as the main defence strategy ag...

Please sign up or login with your details

Forgot password? Click here to reset