Purifying Adversarial Perturbation with Adversarially Trained Auto-encoders

05/26/2019
by   Hebi Li, et al.
0

Machine learning models are vulnerable to adversarial examples. Iterative adversarial training has shown promising results against strong white-box attacks. However, adversarial training is very expensive, and every time a model needs to be protected, such expensive training scheme needs to be performed. In this paper, we propose to apply iterative adversarial training scheme to an external auto-encoder, which once trained can be used to protect other models directly. We empirically show that our model outperforms other purifying-based methods against white-box attacks, and transfers well to directly protect other base models with different architectures.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset