A Model-free Learning Algorithm for Infinite-horizon Average-reward MDPs with Near-optimal Regret

06/08/2020
by   Mehdi Jafarnia-Jahromi, et al.
12

Recently, model-free reinforcement learning has attracted research attention due to its simplicity, memory and computation efficiency, and the flexibility to combine with function approximation. In this paper, we propose Exploration Enhanced Q-learning (EE-QL), a model-free algorithm for infinite-horizon average-reward Markov Decision Processes (MDPs) that achieves regret bound of O(√(T)) for the general class of weakly communicating MDPs, where T is the number of interactions. EE-QL assumes that an online concentrating approximation of the optimal average reward is available. This is the first model-free learning algorithm that achieves O(√(T)) regret without the ergodic assumption, and matches the lower bound in terms of T except for logarithmic factors. Experiments show that the proposed algorithm performs as well as the best known model-based algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/15/2019

Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes

Model-free reinforcement learning is known to be memory and computation ...
research
04/21/2020

Almost Optimal Model-Free Reinforcement Learning via Reference-Advantage Decomposition

We study the reinforcement learning problem in the setting of finite-hor...
research
01/13/2023

Decentralized model-free reinforcement learning in stochastic games with average-reward objective

We propose the first model-free algorithm that achieves low regret perfo...
research
03/11/2020

Model-Free Algorithm and Regret Analysis for MDPs with Peak Constraints

In the optimization of dynamic systems, the variables typically have con...
research
10/11/2022

Factors of Influence of the Overestimation Bias of Q-Learning

We study whether the learning rate α, the discount factor γ and the rewa...
research
05/23/2022

Computationally Efficient Horizon-Free Reinforcement Learning for Linear Mixture MDPs

Recent studies have shown that episodic reinforcement learning (RL) is n...
research
05/17/2023

Model-Free Robust Average-Reward Reinforcement Learning

Robust Markov decision processes (MDPs) address the challenge of model u...

Please sign up or login with your details

Forgot password? Click here to reset