Detecting Universal Trigger's Adversarial Attack with Honeypot

11/20/2020
by   Thai Le, et al.
0

The Universal Trigger (UniTrigger) is a recently-proposed powerful adversarial textual attack method. Utilizing a learning-based mechanism, UniTrigger can generate a fixed phrase that when added to any benign inputs, can drop the prediction accuracy of a textual neural network (NN) model to near zero on a target class. To defend against this new attack method that may cause significant harm, we borrow the "honeypot" concept from the cybersecurity community and propose DARCY, a honeypot-based defense framework. DARCY adaptively searches and injects multiple trapdoors into an NN model to "bait and catch" potential attacks. Through comprehensive experiments across five public datasets, we demonstrate that DARCY detects UniTrigger's adversarial attacks with up to 99 difference of only around 2 inputs. We also show that DARCY with multiple trapdoors is robust under different assumptions with respect to attackers' knowledge and skills.

READ FULL TEXT
research
11/17/2020

SIENA: Stochastic Multi-Expert Neural Patcher

Neural network (NN) models that are solely trained to maximize the likel...
research
06/14/2020

Adversarial Attacks and Detection on Reinforcement Learning-Based Interactive Recommender Systems

Adversarial attacks pose significant challenges for detecting adversaria...
research
09/13/2021

TREATED:Towards Universal Defense against Textual Adversarial Attacks

Recent work shows that deep neural networks are vulnerable to adversaria...
research
01/08/2021

Adversarial Attack Attribution: Discovering Attributable Signals in Adversarial ML Attacks

Machine Learning (ML) models are known to be vulnerable to adversarial i...
research
08/21/2018

Are You Tampering With My Data?

We propose a novel approach towards adversarial attacks on neural networ...
research
09/17/2020

Generating Label Cohesive and Well-Formed Adversarial Claims

Adversarial attacks reveal important vulnerabilities and flaws of traine...
research
10/06/2021

Adversarial Attacks on Machinery Fault Diagnosis

Despite the great progress of neural network-based (NN-based) machinery ...

Please sign up or login with your details

Forgot password? Click here to reset