Near-optimal Reinforcement Learning using Bayesian Quantiles

by   Aristide Tossou, et al.

We study model-based reinforcement learning in finite communicating Markov Decision Process. Algorithms in this settings have been developed in two different ways: the first view, which typically provides frequentist performance guarantees, uses optimism in the face of uncertainty as the guiding algorithmic principle. The second view is based on Bayesian reasoning, combined with posterior sampling and Bayesian guarantees. In this paper, we develop a conceptually simple algorithm, Bayes-UCRL that combines the benefits of both approaches to achieve state-of-the-art performance for finite communicating MDP. In particular, we use Bayesian Prior similarly to Posterior Sampling. However, instead of sampling the MDP, we construct an optimistic MDP using the quantiles of the Bayesian prior. We show that this technique enjoys a high probability worst-case regret of order Õ(√(DSAT)). Experiments in a diverse set of environments show that our algorithms outperform previous methods.


page 1

page 2

page 3

page 4


Near-optimal Optimistic Reinforcement Learning using Empirical Bernstein Inequalities

We study model-based reinforcement learning in an unknown finite communi...

Near-optimal Bayesian Solution For Unknown Discrete Markov Decision Process

We tackle the problem of acting in an unknown finite and discrete Markov...

Model-based Lifelong Reinforcement Learning with Bayesian Exploration

We propose a model-based lifelong reinforcement-learning approach that e...

Optimistic Posterior Sampling for Reinforcement Learning with Few Samples and Tight Guarantees

We consider reinforcement learning in an environment modeled by an episo...

Randomized Exploration is Near-Optimal for Tabular MDP

We study exploration using randomized value functions in Thompson Sampli...

Group Distributionally Robust Reinforcement Learning with Hierarchical Latent Variables

One key challenge for multi-task Reinforcement learning (RL) in practice...

Accelerating the Computation of UCB and Related Indices for Reinforcement Learning

In this paper we derive an efficient method for computing the indices as...

Please sign up or login with your details

Forgot password? Click here to reset