Why is Posterior Sampling Better than Optimism for Reinforcement Learning?

07/01/2016
by   Ian Osband, et al.
0

Computational results demonstrate that posterior sampling for reinforcement learning (PSRL) dramatically outperforms algorithms driven by optimism, such as UCRL2. We provide insight into the extent of this performance boost and the phenomenon that drives it. We leverage this insight to establish an Õ(H√(SAT)) Bayesian expected regret bound for PSRL in finite-horizon episodic Markov decision processes, where H is the horizon, S is the number of states, A is the number of actions and T is the time elapsed. This improves upon the best previous bound of Õ(H S √(AT)) for any reinforcement learning algorithm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/19/2019

Delegative Reinforcement Learning: learning to avoid traps with a little help

Most known regret bounds for reinforcement learning are either episodic ...
research
06/04/2013

(More) Efficient Reinforcement Learning via Posterior Sampling

Most provably-efficient learning algorithms introduce optimism about poo...
research
02/21/2022

Double Thompson Sampling in Finite stochastic Games

We consider the trade-off problem between exploration and exploitation u...
research
09/08/2021

Learning Zero-sum Stochastic Games with Posterior Sampling

In this paper, we propose Posterior Sampling Reinforcement Learning for ...
research
11/29/2022

Posterior Sampling for Continuing Environments

We develop an extension of posterior sampling for reinforcement learning...
research
05/13/2023

Thompson Sampling for Parameterized Markov Decision Processes with Uninformative Actions

We study parameterized MDPs (PMDPs) in which the key parameters of inter...
research
08/09/2016

Posterior Sampling for Reinforcement Learning Without Episodes

This is a brief technical note to clarify some of the issues with applyi...

Please sign up or login with your details

Forgot password? Click here to reset