Minimally distorted Adversarial Examples with a Fast Adaptive Boundary Attack

07/03/2019
by   Francesco Croce, et al.
0

The evaluation of robustness against adversarial manipulation of neural networks-based classifiers is mainly tested with empirical attacks as the methods for the exact computation, even when available, do not scale to large networks. We propose in this paper a new white-box adversarial attack wrt the l_p-norms for p ∈{1,2,∞} aiming at finding the minimal perturbation necessary to change the class of a given input. It has an intuitive geometric meaning, yields high quality results already with one restart, minimizes the size of the perturbation, so that the robust accuracy can be evaluated at all possible thresholds with a single run, and comes with almost no free parameters except number of iterations and restarts. It achieves better or similar robust test accuracy compared to state-of-the-art attacks which are partially specialized to one l_p-norm.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset