Testing Robustness Against Unforeseen Adversaries

08/21/2019
by   Daniel Kang, et al.
1

Considerable work on adversarial defense has studied robustness to a fixed, known family of adversarial distortions, most frequently L_p-bounded distortions. In reality, the specific form of attack will rarely be known and adversaries are free to employ distortions outside of any fixed set. The present work advocates measuring robustness against this much broader range of unforeseen attacks---attacks whose precise form is not known when designing a defense. We propose a methodology for evaluating a defense against a diverse range of distortion types together with a summary metric UAR that measures the Unforeseen Attack Robustness against a distortion. We construct novel JPEG, Fog, Gabor, and Snow adversarial attacks to simulate unforeseen adversaries and perform a careful study of adversarial robustness against these and existing distortion types. We find that evaluation against existing L_p attacks yields highly correlated information that may not generalize to other attacks and identify a set of 4 attacks that yields more diverse information. We further find that adversarial training against either one or multiple distortions, including our novel ones, does not confer robustness to unforeseen distortions. These results underscore the need to study robustness against unforeseen distortions and provide a starting point for doing so.

READ FULL TEXT

page 9

page 14

page 15

page 17

page 19

page 20

page 21

page 22

research
09/11/2020

Defending Against Multiple and Unforeseen Adversarial Videos

Adversarial examples of deep neural networks have been actively investig...
research
08/10/2022

Reducing Exploitability with Population Based Training

Self-play reinforcement learning has achieved state-of-the-art, and ofte...
research
07/12/2021

A Closer Look at the Adversarial Robustness of Information Bottleneck Models

We study the adversarial robustness of information bottleneck models for...
research
09/05/2020

Dual Manifold Adversarial Robustness: Defense against Lp and non-Lp Adversarial Attacks

Adversarial training is a popular defense strategy against attack threat...
research
05/31/2020

Evaluations and Methods for Explanation through Robustness Analysis

Among multiple ways of interpreting a machine learning model, measuring ...
research
06/08/2019

Strategies to architect AI Safety: Defense to guard AI from Adversaries

The impact of designing for security of AI is critical for humanity in t...
research
07/20/2023

A LLM Assisted Exploitation of AI-Guardian

Large language models (LLMs) are now highly capable at a diverse range o...

Please sign up or login with your details

Forgot password? Click here to reset