RobustBench: a standardized adversarial robustness benchmark

by   Francesco Croce, et al.

Evaluation of adversarial robustness is often error-prone leading to overestimation of the true robustness of models. While adaptive attacks designed for a particular defense are a way out of this, there are only approximate guidelines on how to perform them. Moreover, adaptive evaluations are highly customized for particular models, which makes it difficult to compare different defenses. Our goal is to establish a standardized benchmark of adversarial robustness, which as accurately as possible reflects the robustness of the considered models within a reasonable computational budget. This requires to impose some restrictions on the admitted models to rule out defenses that only make gradient-based attacks ineffective without improving actual robustness. We evaluate robustness of models for our benchmark with AutoAttack, an ensemble of white- and black-box attacks which was recently shown in a large-scale study to improve almost all robustness evaluations compared to the original publications. Our leaderboard, hosted at, aims at reflecting the current state of the art on a set of well-defined tasks in ℓ_∞- and ℓ_2-threat models with possible extensions in the future. Additionally, we open-source the library that provides unified access to state-of-the-art robust models to facilitate their downstream applications. Finally, based on the collected models, we analyze general trends in ℓ_p-robustness and its impact on other tasks such as robustness to various distribution shifts and out-of-distribution detection.


page 1

page 2

page 3

page 4


Indicators of Attack Failure: Debugging and Improving Optimization of Adversarial Examples

Evaluating robustness of machine-learning models to adversarial examples...

Beware the Black-Box: on the Robustness of Recent Defenses to Adversarial Examples

Recent defenses published at venues like NIPS, ICML, ICLR and CVPR are m...

MORA: Improving Ensemble Robustness Evaluation with Model-Reweighing Attack

Adversarial attacks can deceive neural networks by adding tiny perturbat...

On the Robustness of Latent Diffusion Models

Latent diffusion models achieve state-of-the-art performance on a variet...

Graph Robustness Benchmark: Benchmarking the Adversarial Robustness of Graph Machine Learning

Adversarial attacks on graphs have posed a major threat to the robustnes...

On the Robustness of ChatGPT: An Adversarial and Out-of-distribution Perspective

ChatGPT is a recent chatbot service released by OpenAI and is receiving ...

Please sign up or login with your details

Forgot password? Click here to reset