Sample-Efficient Reinforcement Learning for Linearly-Parameterized MDPs with a Generative Model

05/28/2021
by   Bingyan Wang, et al.
15

The curse of dimensionality is a widely known issue in reinforcement learning (RL). In the tabular setting where the state space 𝒮 and the action space 𝒜 are both finite, to obtain a nearly optimal policy with sampling access to a generative model, the minimax optimal sample complexity scales linearly with |𝒮|×|𝒜|, which can be prohibitively large when 𝒮 or 𝒜 is large. This paper considers a Markov decision process (MDP) that admits a set of state-action features, which can linearly express (or approximate) its probability transition kernel. We show that a model-based approach (resp.Q-learning) provably learns an ε-optimal policy (resp.Q-function) with high probability as soon as the sample size exceeds the order of K/(1-γ)^3ε^2 (resp.K/(1-γ)^4ε^2), up to some logarithmic factor. Here K is the feature dimension and γ∈(0,1) is the discount factor of the MDP. Both sample complexity bounds are provably tight, and our result for the model-based approach matches the minimax lower bound. Our results show that for arbitrarily large-scale MDP, both the model-based approach and Q-learning are sample-efficient when K is relatively small, and hence the title of this paper.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/12/2020

Is Plug-in Solver Sample-Efficient for Feature-based Reinforcement Learning?

It is believed that a model-based approach for reinforcement learning (R...
research
05/26/2020

Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model

We investigate the sample efficiency of reinforcement learning in a γ-di...
research
08/17/2020

On the Sample Complexity of Reinforcement Learning with Policy Space Generalization

We study the optimal sample complexity in large-scale Reinforcement Lear...
research
02/12/2021

Is Q-Learning Minimax Optimal? A Tight Sample Complexity Analysis

Q-learning, which seeks to learn the optimal Q-function of a Markov deci...
research
06/10/2019

On the Optimality of Sparse Model-Based Planning for Markov Decision Processes

This work considers the sample complexity of obtaining an ϵ-optimal poli...
research
02/13/2019

Sample-Optimal Parametric Q-Learning with Linear Transition Models

Consider a Markov decision process (MDP) that admits a set of state-acti...
research
07/15/2020

Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity

Model-based reinforcement learning (RL), which finds an optimal policy u...

Please sign up or login with your details

Forgot password? Click here to reset