Improved Analysis of UCRL2 with Empirical Bernstein Inequality

07/10/2020
by   Ronan Fruit, et al.
0

We consider the problem of exploration-exploitation in communicating Markov Decision Processes. We provide an analysis of UCRL2 with Empirical Bernstein inequalities (UCRL2B). For any MDP with S states, A actions, Γ≤ S next states and diameter D, the regret of UCRL2B is bounded as O(√(DΓ S A T)).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/11/2018

Exploration Bonus for Regret Minimization in Undiscounted Discrete and Continuous Markov Decision Processes

We introduce and analyse two algorithms for exploration-exploitation in ...
research
07/06/2018

Near Optimal Exploration-Exploitation in Non-Communicating Markov Decision Processes

While designing the state space of an MDP, it is common to include state...
research
10/15/2018

Successor Uncertainties: exploration and uncertainty in temporal difference learning

We consider the problem of balancing exploration and exploitation in seq...
research
06/17/2019

Of Cores: A Partial-Exploration Framework for Markov Decision Processes

We introduce a framework for approximate analysis of Markov decision pro...
research
07/27/2022

Satisfiability Bounds for ω-Regular Properties in Bounded-Parameter Markov Decision Processes

We consider the problem of computing minimum and maximum probabilities o...
research
08/03/2009

Regret Bounds for Opportunistic Channel Access

We consider the task of opportunistic channel access in a primary system...
research
06/14/2019

Online Allocation and Pricing: Constant Regret via Bellman Inequalities

We develop a framework for designing tractable heuristics for Markov Dec...

Please sign up or login with your details

Forgot password? Click here to reset