DeepAI AI Chat
Log In Sign Up

Optimistic Posterior Sampling for Reinforcement Learning with Few Samples and Tight Guarantees

by   Daniil Tiapkin, et al.

We consider reinforcement learning in an environment modeled by an episodic, finite, stage-dependent Markov decision process of horizon H with S states, and A actions. The performance of an agent is measured by the regret after interacting with the environment for T episodes. We propose an optimistic posterior sampling algorithm for reinforcement learning (OPSRL), a simple variant of posterior sampling that only needs a number of posterior samples logarithmic in H, S, A, and T per state-action pair. For OPSRL we guarantee a high-probability regret bound of order at most π’ͺ(√(H^3SAT)) ignoring polylog(HSAT) terms. The key novel technical ingredient is a new sharp anti-concentration inequality for linear forms which may be of independent interest. Specifically, we extend the normal approximation-based lower bound for Beta distributions by Alfers and Dinges [1984] to Dirichlet distributions. Our bound matches the lower bound of order Ξ©(√(H^3SAT)), thereby answering the open problems raised by Agrawal and Jia [2017b] for the episodic setting.


page 1

page 2

page 3

page 4

βˆ™ 07/19/2019

Delegative Reinforcement Learning: learning to avoid traps with a little help

Most known regret bounds for reinforcement learning are either episodic ...
βˆ™ 05/16/2022

From Dirichlet to Rubin: Optimistic Exploration in RL without Bonuses

We propose the Bayes-UCBVI algorithm for reinforcement learning in tabul...
βˆ™ 03/01/2021

UCB Momentum Q-learning: Correcting the bias without forgetting

We propose UCBMQ, Upper Confidence Bound Momentum Q-learning, a new algo...
βˆ™ 11/29/2022

Posterior Sampling for Continuing Environments

We develop an extension of posterior sampling for reinforcement learning...
βˆ™ 02/21/2022

Double Thompson Sampling in Finite stochastic Games

We consider the trade-off problem between exploration and exploitation u...
βˆ™ 11/08/2020

Online Sparse Reinforcement Learning

We investigate the hardness of online reinforcement learning in fixed ho...
βˆ™ 06/20/2019

Near-optimal Reinforcement Learning using Bayesian Quantiles

We study model-based reinforcement learning in finite communicating Mark...