Optimal Attacks on Reinforcement Learning Policies

by   Alessio Russo, et al.

Control policies, trained using the Deep Reinforcement Learning, have been recently shown to be vulnerable to adversarial attacks introducing even very small perturbations to the policy input. The attacks proposed so far have been designed using heuristics, and build on existing adversarial example crafting techniques used to dupe classifiers in supervised learning. In contrast, this paper investigates the problem of devising optimal attacks, depending on a well-defined attacker's objective, e.g., to minimize the main agent average reward. When the policy and the system dynamics, as well as rewards, are known to the attacker, a scenario referred to as a white-box attack, designing optimal attacks amounts to solving a Markov Decision Process. For what we call black-box attacks, where neither the policy nor the system is known, optimal attacks can be trained using Reinforcement Learning techniques. Through numerical experiments, we demonstrate the efficiency of our attacks compared to existing attacks (usually based on Gradient methods). We further quantify the potential impact of attacks and establish its connection to the smoothness of the policy under attack. Smooth policies are naturally less prone to attacks (this explains why Lipschitz policies, with respect to the state, are more resilient). Finally, we show that from the main agent perspective, the system uncertainties and the attacker can be modeled as a Partially Observable Markov Decision Process. We actually demonstrate that using Reinforcement Learning techniques tailored to POMDP (e.g. using Recurrent Neural Networks) leads to more resilient policies.


page 3

page 4

page 5

page 6

page 9

page 11

page 14

page 16


Adversarial Attacks on Neural Network Policies

Machine learning classifiers are known to be vulnerable to inputs malici...

White-Box Adversarial Policies in Deep Reinforcement Learning

Adversarial examples against AI systems pose both risks via malicious at...

Adversarial joint attacks on legged robots

We address adversarial attacks on the actuators at the joints of legged ...

Defending Observation Attacks in Deep Reinforcement Learning via Detection and Denoising

Neural network policies trained using Deep Reinforcement Learning (DRL) ...

Balancing detectability and performance of attacks on the control channel of Markov Decision Processes

We investigate the problem of designing optimal stealthy poisoning attac...

How memory architecture affects performance and learning in simple POMDPs

Reinforcement learning is made much more complex when the agent's observ...

RADAMS: Resilient and Adaptive Alert and Attention Management Strategy against Informational Denial-of-Service (IDoS) Attacks

Attacks exploiting human attentional vulnerability have posed severe thr...

Please sign up or login with your details

Forgot password? Click here to reset