Optimal Sample Complexity of Reinforcement Learning for Uniformly Ergodic Discounted Markov Decision Processes

02/15/2023
by   Shengbo Wang, et al.
0

We consider the optimal sample complexity theory of tabular reinforcement learning (RL) for controlling the infinite horizon discounted reward in a Markov decision process (MDP). Optimal min-max complexity results have been developed for tabular RL in this setting, leading to a sample complexity dependence on γ and ϵ of the form Θ̃((1-γ)^-3ϵ^-2), where γ is the discount factor and ϵ is the tolerance solution error. However, in many applications of interest, the optimal policy (or all policies) will induce mixing. We show that in these settings the optimal min-max complexity is Θ̃(t_minorize(1-γ)^-2ϵ^-2), where t_minorize is a measure of mixing that is within an equivalent factor of the total variation mixing time. Our analysis is based on regeneration-type ideas, that, we believe are of independent interest since they can be used to study related problems for general state space MDPs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/01/2021

Settling the Horizon-Dependence of Sample Complexity in Reinforcement Learning

Recently there is a surge of interest in understanding the horizon-depen...
research
06/18/2021

On the Sample Complexity of Batch Reinforcement Learning with Policy-Induced Data

We study the fundamental question of the sample complexity of learning a...
research
03/09/2017

Sample Efficient Feature Selection for Factored MDPs

In reinforcement learning, the state of the real world is often represen...
research
12/09/2021

Reinforcement Learning with Almost Sure Constraints

In this work we address the problem of finding feasible policies for Con...
research
03/08/2023

Policy Mirror Descent Inherently Explores Action Space

Designing computationally efficient exploration strategies for on-policy...
research
06/19/2021

A Max-Min Entropy Framework for Reinforcement Learning

In this paper, we propose a max-min entropy framework for reinforcement ...
research
06/29/2021

Globally Optimal Hierarchical Reinforcement Learning for Linearly-Solvable Markov Decision Processes

In this work we present a novel approach to hierarchical reinforcement l...

Please sign up or login with your details

Forgot password? Click here to reset