Detecting Adversarial Attacks on Neural Network Policies with Visual Foresight

10/02/2017
by   Yen-Chen Lin, et al.
0

Deep reinforcement learning has shown promising results in learning control policies for complex sequential decision-making tasks. However, these neural network-based policies are known to be vulnerable to adversarial examples. This vulnerability poses a potentially serious threat to safety-critical systems such as autonomous vehicles. In this paper, we propose a defense mechanism to defend reinforcement learning agents from adversarial attacks by leveraging an action-conditioned frame prediction module. Our core idea is that the adversarial examples targeting at a neural network-based policy are not effective for the frame prediction model. By comparing the action distribution produced by a policy from processing the current observed frame to the action distribution produced by the same policy from processing the predicted frame from the action-conditioned frame prediction module, we can detect the presence of adversarial examples. Beyond detecting the presence of adversarial examples, our method allows the agent to continue performing the task using the predicted frame when the agent is under attack. We evaluate the performance of our algorithm using five games in Atari 2600. Our results demonstrate that the proposed defense mechanism achieves favorable performance against baseline algorithms in detecting adversarial examples and in earning rewards when the agents are under attack.

READ FULL TEXT
research
05/18/2017

Delving into adversarial attacks on deep policies

Adversarial examples have been shown to exist for a variety of deep lear...
research
05/16/2022

Attacking and Defending Deep Reinforcement Learning Policies

Recent studies have shown that deep reinforcement learning (DRL) policie...
research
10/28/2019

Certified Adversarial Robustness for Deep Reinforcement Learning

Deep Neural Network-based systems are now the state-of-the-art in many r...
research
01/16/2017

Vulnerability of Deep Reinforcement Learning to Policy Induction Attacks

Deep learning classifiers are known to be inherently vulnerable to manip...
research
11/10/2019

Minimalistic Attacks: How Little it Takes to Fool a Deep Reinforcement Learning Policy

Recent studies have revealed that neural network-based policies can be e...
research
01/11/2020

Sparse Black-box Video Attack with Reinforcement Learning

Adversarial attacks on video recognition models have been explored recen...
research
05/09/2021

Learning Image Attacks toward Vision Guided Autonomous Vehicles

While adversarial neural networks have been shown successful for static ...

Please sign up or login with your details

Forgot password? Click here to reset