Resilience from Diversity: Population-based approach to harden models against adversarial attacks

by   Jasser Jasser, et al.

Traditional deep learning models exhibit intriguing vulnerabilities that allow an attacker to force them to fail at their task. Notorious attacks such as the Fast Gradient Sign Method (FGSM) and the more powerful Projected Gradient Descent (PGD) generate adversarial examples by adding a magnitude of perturbation ϵ to the input's computed gradient, resulting in a deterioration of the effectiveness of the model's classification. This work introduces a model that is resilient to adversarial attacks. Our model leverages a well established principle from biological sciences: population diversity produces resilience against environmental changes. More precisely, our model consists of a population of n diverse submodels, each one of them trained to individually obtain a high accuracy for the task at hand, while forced to maintain meaningful differences in their weight tensors. Each time our model receives a classification query, it selects a submodel from its population at random to answer the query. To introduce and maintain diversity in population of submodels, we introduce the concept of counter linking weights. A Counter-Linked Model (CLM) consists of submodels of the same architecture where a periodic random similarity examination is conducted during the simultaneous training to guarantee diversity while maintaining accuracy. In our testing, CLM robustness got enhanced by around 20 dataset and at least 15 with adversarially trained submodels, this methodology achieves state-of-the-art robustness. On the MNIST dataset with ϵ=0.3, it achieved 94.34 ϵ=8/255, it achieved 62.97


Adversarial Attacks in Sound Event Classification

Adversarial attacks refer to a set of methods that perturb the input to ...

Improved Adversarial Robustness by Reducing Open Space Risk via Tent Activations

Adversarial examples contain small perturbations that can remain imperce...

Diversified Adversarial Attacks based on Conjugate Gradient Method

Deep learning models are vulnerable to adversarial examples, and adversa...

AutoGAN: Robust Classifier Against Adversarial Attacks

Classifiers fail to classify correctly input images that have been purpo...

A Hybrid Defense Method against Adversarial Attacks on Traffic Sign Classifiers in Autonomous Vehicles

Adversarial attacks can make deep neural network (DNN) models predict in...

MockingBERT: A Method for Retroactively Adding Resilience to NLP Models

Protecting NLP models against misspellings whether accidental or adversa...

Please sign up or login with your details

Forgot password? Click here to reset