Understanding the Limits of Poisoning Attacks in Episodic Reinforcement Learning

08/29/2022
by   Anshuka Rangi, et al.
0

To understand the security threats to reinforcement learning (RL) algorithms, this paper studies poisoning attacks to manipulate any order-optimal learning algorithm towards a targeted policy in episodic RL and examines the potential damage of two natural types of poisoning attacks, i.e., the manipulation of reward and action. We discover that the effect of attacks crucially depend on whether the rewards are bounded or unbounded. In bounded reward settings, we show that only reward manipulation or only action manipulation cannot guarantee a successful attack. However, by combining reward and action manipulation, the adversary can manipulate any order-optimal learning algorithm to follow any targeted policy with Θ̃(√(T)) total attack cost, which is order-optimal, without any knowledge of the underlying MDP. In contrast, in unbounded reward settings, we show that reward manipulation attacks are sufficient for an adversary to successfully manipulate any order-optimal learning algorithm to follow any targeted policy using Õ(√(T)) amount of contamination. Our results reveal useful insights about what can or cannot be achieved by poisoning attacks, and are set to spur more works on the design of robust RL algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/07/2020

Manipulating Reinforcement Learning: Poisoning Attacks on Cost Signals

This chapter studies emerging cyber-attacks on reinforcement learning (R...
research
06/24/2019

Deceptive Reinforcement Learning Under Adversarial Manipulations on Cost Signals

This paper studies reinforcement learning (RL) under malicious falsifica...
research
01/14/2021

How to Attack and Defend 5G Radio Access Network Slicing with Reinforcement Learning

Reinforcement learning (RL) for network slicing is considered in the 5G ...
research
09/08/2022

Reward Delay Attacks on Deep Reinforcement Learning

Most reinforcement learning algorithms implicitly assume strong synchron...
research
01/03/2022

Execute Order 66: Targeted Data Poisoning for Reinforcement Learning

Data poisoning for reinforcement learning has historically focused on ge...
research
02/17/2020

Robust Stochastic Bandit Algorithms under Probabilistic Unbounded Adversarial Attack

The multi-armed bandit formalism has been extensively studied under vari...
research
07/29/2023

PIMbot: Policy and Incentive Manipulation for Multi-Robot Reinforcement Learning in Social Dilemmas

Recent research has demonstrated the potential of reinforcement learning...

Please sign up or login with your details

Forgot password? Click here to reset