Feature Denoising for Improving Adversarial Robustness

12/09/2018
by   Cihang Xie, et al.
0

Adversarial attacks to image classification systems present challenges to convolutional networks and opportunities for understanding them. This study suggests that adversarial perturbations on images lead to noise in the features constructed by these networks. Motivated by this observation, we develop new network architectures that increase adversarial robustness by performing feature denoising. Specifically, our networks contain blocks that denoise the features using non-local means or other filters; the entire networks are trained end-to-end. When combined with adversarial training, our feature denoising networks substantially improve the state-of-the-art in adversarial robustness in both white-box and black-box attack settings. On ImageNet, under 10-iteration PGD white-box attacks where prior art has 27.9 method achieves 55.7 our method secures 42.6 first in Competition on Adversarial Attacks and Defenses (CAAD) 2018 --- it achieved 50.6 against 48 unknown attackers, surpassing the runner-up approach by 10 and models will be made publicly available.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset