The Best of Both Worlds: Reinforcement Learning with Logarithmic Regret and Policy Switches

03/03/2022
by   Grigoris Velegkas, et al.
3

In this paper, we study the problem of regret minimization for episodic Reinforcement Learning (RL) both in the model-free and the model-based setting. We focus on learning with general function classes and general model classes, and we derive results that scale with the eluder dimension of these classes. In contrast to the existing body of work that mainly establishes instance-independent regret guarantees, we focus on the instance-dependent setting and show that the regret scales logarithmically with the horizon T, provided that there is a gap between the best and the second best action in every state. In addition, we show that such a logarithmic regret bound is realizable by algorithms with O(log T) switching cost (also known as adaptivity complexity). In other words, these algorithms rarely switch their policy during the course of their execution. Finally, we complement our results with lower bounds which show that even in the tabular setting, we cannot hope for regret guarantees lower than o(log T).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/23/2020

Logarithmic Regret for Reinforcement Learning with Linear Function Approximation

Reinforcement learning (RL) with linear function approximation has recei...
research
04/21/2020

Almost Optimal Model-Free Reinforcement Learning via Reference-Advantage Decomposition

We study the reinforcement learning problem in the setting of finite-hor...
research
05/05/2013

Regret Bounds for Reinforcement Learning with Policy Advice

In some reinforcement learning problems an agent may be provided with a ...
research
07/01/2021

Gap-Dependent Bounds for Two-Player Markov Games

As one of the most popular methods in the field of reinforcement learnin...
research
06/16/2020

Q-learning with Logarithmic Regret

This paper presents the first non-asymptotic result showing that a model...
research
02/07/2022

On learning Whittle index policy for restless bandits with scalable regret

Reinforcement learning is an attractive approach to learn good resource ...
research
05/15/2023

Horizon-free Reinforcement Learning in Adversarial Linear Mixture MDPs

Recent studies have shown that episodic reinforcement learning (RL) is n...

Please sign up or login with your details

Forgot password? Click here to reset