Detecting AI Trojans Using Meta Neural Analysis

10/08/2019
by   Xiaojun Xu, et al.
27

Machine learning models, especially neural networks (NNs), have achieved outstanding performance on diverse and complex applications. However, recent work has found that they are vulnerable to Trojan attacks where an adversary trains a corrupted model with poisoned data or directly manipulates its parameters in a stealthy way. Such Trojaned models can obtain good performance on normal data during test time while predicting incorrectly on the adversarially manipulated data samples. This paper aims to develop ways to detect Trojaned models. We mainly explore the idea of meta neural analysis, a technique involving training a meta NN model that can be used to predict whether or not a target NN model has certain properties. We develop a novel pipeline Meta Neural Trojaned model Detection (MNTD) system to predict if a given NN is Trojaned via meta neural analysis on a set of trained shallow models. We propose two ways to train the meta-classifier without knowing the Trojan attacker's strategies. The first one, one-class learning, will fit a novel detection meta-classifier using only benign neural networks. The second one, called jumbo learning, will approximate a general distribution of Trojaned models and sample a "jumbo" set of Trojaned models to train the meta-classifier and evaluate on the unseen Trojan strategies. Extensive experiments demonstrate the effectiveness of MNTD in detecting different Trojan attacks in diverse areas such as vision, speech, tabular data, and natural language processing. We show that MNTD reaches an average of 97 Curve) score and outperforms existing approaches. Furthermore, we design and evaluate MNTD system to defend against strong adaptive attackers who have exactly the knowledge of the detection, which demonstrates the robustness of MNTD.

READ FULL TEXT

page 1

page 3

page 6

page 11

page 15

research
03/27/2023

Detecting Backdoors During the Inference Stage Based on Corruption Robustness Consistency

Deep neural networks are proven to be vulnerable to backdoor attacks. De...
research
10/29/2021

MetaICL: Learning to Learn In Context

We introduce MetaICL (Meta-training for In-Context Learning), a new meta...
research
08/09/2019

DeepCleanse: A Black-box Input SanitizationFramework Against Backdoor Attacks on DeepNeural Networks

As Machine Learning, especially Deep Learning, has been increasingly use...
research
08/08/2023

Improved Activation Clipping for Universal Backdoor Mitigation and Test-Time Detection

Deep neural networks are vulnerable to backdoor attacks (Trojans), where...
research
02/20/2023

Meta-World Conditional Neural Processes

We propose Meta-World Conditional Neural Processes (MW-CNP), a condition...
research
02/08/2018

PoTrojan: powerful neural-level trojan designs in deep learning models

With the popularity of deep learning (DL), artificial intelligence (AI) ...
research
07/22/2021

Spinning Sequence-to-Sequence Models with Meta-Backdoors

We investigate a new threat to neural sequence-to-sequence (seq2seq) mod...

Please sign up or login with your details

Forgot password? Click here to reset