Attention Hijacking in Trojan Transformers

08/09/2022
by   Weimin Lyu, et al.
13

Trojan attacks pose a severe threat to AI systems. Recent works on Transformer models received explosive popularity and the self-attentions are now indisputable. This raises a central question: Can we reveal the Trojans through attention mechanisms in BERTs and ViTs? In this paper, we investigate the attention hijacking pattern in Trojan AIs, , the trigger token “kidnaps” the attention weights when a specific trigger is present. We observe the consistent attention hijacking pattern in Trojan Transformers from both Natural Language Processing (NLP) and Computer Vision (CV) domains. This intriguing property helps us to understand the Trojan mechanism in BERTs and ViTs. We also propose an Attention-Hijacking Trojan Detector (AHTD) to discriminate the Trojan AIs from the clean ones.

READ FULL TEXT
research
05/13/2022

A Study of the Attention Abnormality in Trojaned BERTs

Trojan attacks raise serious security concerns. In this paper, we invest...
research
10/08/2021

Adversarial Token Attacks on Vision Transformers

Vision transformers rely on a patch token based self attention mechanism...
research
11/22/2021

MetaFormer is Actually What You Need for Vision

Transformers have shown great potential in computer vision tasks. A comm...
research
03/30/2020

Code Prediction by Feeding Trees to Transformers

In this paper, we describe how to leverage Transformer, a recent neural ...
research
12/30/2021

Attention mechanisms and deep learning for machine vision: A survey of the state of the art

With the advent of state of the art nature-inspired pure attention based...
research
01/28/2021

A Neural Few-Shot Text Classification Reality Check

Modern classification models tend to struggle when the amount of annotat...
research
09/13/2023

Traveling Words: A Geometric Interpretation of Transformers

Transformers have significantly advanced the field of natural language p...

Please sign up or login with your details

Forgot password? Click here to reset