Delayed Feedback in Episodic Reinforcement Learning

11/15/2021
by   Benjamin Howson, et al.
0

There are many provably efficient algorithms for episodic reinforcement learning. However, these algorithms are built under the assumption that the sequences of states, actions and rewards associated with each episode arrive immediately, allowing policy updates after every interaction with the environment. This assumption is often unrealistic in practice, particularly in areas such as healthcare and online recommendation. In this paper, we study the impact of delayed feedback on several provably efficient algorithms for regret minimisation in episodic reinforcement learning. Firstly, we consider updating the policy as soon as new feedback becomes available. Using this updating scheme, we show that the regret increases by an additive term involving the number of states, actions, episode length and the expected delay. This additive term changes depending on the optimistic algorithm of choice. We also show that updating the policy less frequently can lead to an improved dependency of the regret on the delays.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/30/2020

Provably More Efficient Q-Learning in the Full-Feedback/One-Sided-Feedback Settings

We propose two new Q-learning algorithms, Full-Q-Learning (FQL) and Elim...
research
06/04/2013

(More) Efficient Reinforcement Learning via Posterior Sampling

Most provably-efficient learning algorithms introduce optimism about poo...
research
07/19/2019

Delegative Reinforcement Learning: learning to avoid traps with a little help

Most known regret bounds for reinforcement learning are either episodic ...
research
01/30/2023

Improved Regret for Efficient Online Reinforcement Learning with Linear Function Approximation

We study reinforcement learning with linear function approximation and a...
research
05/13/2023

Delay-Adapted Policy Optimization and Improved Regret for Adversarial MDP with Delayed Bandit Feedback

Policy Optimization (PO) is one of the most popular methods in Reinforce...
research
06/04/2013

Online Learning under Delayed Feedback

Online learning with delayed feedback has received increasing attention ...
research
03/02/2023

Optimal Rates and Efficient Algorithms for Online Bayesian Persuasion

Bayesian persuasion studies how an informed sender should influence beli...

Please sign up or login with your details

Forgot password? Click here to reset