Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers

06/09/2019
by   Hadi Salman, et al.
4

Recent works have shown the effectiveness of randomized smoothing as a scalable technique for building neural network-based classifiers that are provably robust to ℓ_2-norm adversarial perturbations. In this paper, we employ adversarial training to improve the performance of randomized smoothing. We design an adapted attack for smoothed classifiers, and we show how this attack can be used in an adversarial training setting to boost the provable robustness of smoothed classifiers. We demonstrate through extensive experimentation that our method consistently outperforms all existing provably ℓ_2-robust classifiers by a significant margin on ImageNet and CIFAR-10, establishing the state-of-the-art for provable ℓ_2-defenses. Our code and trained models are available at http://github.com/Hadisalman/smoothing-adversarial .

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/08/2019

Certified Adversarial Robustness via Randomized Smoothing

Recent work has shown that any classifier which classifies well under Ga...
research
01/08/2020

MACER: Attack-free and Scalable Robust Training via Maximizing Certified Radius

Adversarial training is one of the most popular ways to learn robust mod...
research
05/09/2020

Provable Robust Classification via Learned Smoothed Densities

Smoothing classifiers and probability density functions with Gaussian ke...
research
05/31/2018

Scaling provable adversarial defenses

Recent work has developed methods for learning deep network classifiers ...
research
01/29/2023

Improving the Accuracy-Robustness Trade-off of Classifiers via Adaptive Smoothing

While it is shown in the literature that simultaneously accurate and rob...
research
05/19/2020

Enhancing Certified Robustness of Smoothed Classifiers via Weighted Model Ensembling

Randomized smoothing has achieved state-of-the-art certified robustness ...
research
08/29/2022

Reducing Certified Regression to Certified Classification

Adversarial training instances can severely distort a model's behavior. ...

Please sign up or login with your details

Forgot password? Click here to reset