PatchGuard: Provable Defense against Adversarial Patches Using Masks on Small Receptive Fields

by   Chong Xiang, et al.
Princeton University

Localized adversarial patches aim to induce misclassification in machine learning models by arbitrarily modifying pixels within a restricted region of an image. Such attacks can be realized in the physical world by attaching the adversarial patch to the object to be misclassified. In this paper, we propose a general defense framework that can achieve both high clean accuracy and provable robustness against localized adversarial patches. The cornerstone of our defense framework is to use a convolutional network with small receptive fields that impose a bound on the number of features corrupted by an adversarial patch. We further present the robust masking defense that robustly detects and masks corrupted features for a secure feature aggregation. We evaluate our defense against the most powerful white-box untargeted adaptive attacker and achieve a 92.3 accuracy on a 10-class subset of ImageNet against a 31x31 adversarial patch (2 pixels), a 57.4 1000-class ImageNet against a 31x31 patch (2 accuracy and a 61.3 pixels). Notably, our provable defenses achieve state-of-the-art provable robust accuracy on ImageNet and CIFAR-10.


page 1

page 4


PatchGuard++: Efficient Provable Attack Detection against Adversarial Patches

An adversarial patch can arbitrarily manipulate image pixels within a re...

PatchCleanser: Certifiably Robust Defense against Adversarial Patches for Any Image Classifier

The adversarial patch attack against image classification models aims to...

Zero-Shot Certified Defense against Adversarial Patches with Vision Transformers

Adversarial patch attack aims to fool a machine learning model by arbitr...

ScaleCert: Scalable Certified Defense against Adversarial Patches with Sparse Superficial Layers

Adversarial patch attacks that craft the pixels in a confined region of ...

Revisiting Image Classifier Training for Improved Certified Robust Defense against Adversarial Patches

Certifiably robust defenses against adversarial patches for image classi...

Towards Practical Certifiable Patch Defense with Vision Transformer

Patch attacks, one of the most threatening forms of physical attack in a...

On Provable Backdoor Defense in Collaborative Learning

As collaborative learning allows joint training of a model using multipl...

Please sign up or login with your details

Forgot password? Click here to reset