A Provably Efficient Sample Collection Strategy for Reinforcement Learning

07/13/2020
by   Jean Tarbouriech, et al.
10

A common assumption in reinforcement learning (RL) is to have access to a generative model (i.e., a simulator of the environment), which allows to generate samples from any desired state-action pair. Nonetheless, in many settings a generative model may not be available and an adaptive exploration strategy is needed to efficiently collect samples from an unknown environment by direct interaction. In this paper, we study the scenario where an algorithm based on the generative model assumption defines the (possibly time-varying) amount of samples b(s,a) required at each state-action pair (s,a) and an exploration strategy has to learn how to generate b(s,a) samples as fast as possible. Building on recent results for regret minimization in the stochastic shortest path (SSP) setting (Cohen et al., 2020; Tarbouriech et al., 2020), we derive an algorithm that requires Õ( B D + D^3/2 S^2 A) time steps to collect the B = ∑_s,a b(s,a) desired samples, in any unknown and communicating MDP with S states, A actions and diameter D. Leveraging the generality of our strategy, we readily apply it to a variety of existing settings (e.g., model estimation, pure exploration in MDPs) for which we obtain improved sample-complexity guarantees, and to a set of new problems such as best-state identification and sparse reward discovery.

READ FULL TEXT

page 1

page 2

page 3

page 4

03/14/2023

Fast Rates for Maximum Entropy Exploration

We consider the reinforcement learning (RL) setting, in which the agent ...
05/31/2023

Replicability in Reinforcement Learning

We initiate the mathematical study of replicability as an algorithmic pr...
03/23/2021

An Exponential Lower Bound for Linearly-Realizable MDPs with Constant Suboptimality Gap

A fundamental question in the theory of reinforcement learning is: suppo...
10/10/2022

Reaching Goals is Hard: Settling the Sample Complexity of the Stochastic Shortest Path

We study the sample complexity of learning an ϵ-optimal policy in the St...
04/24/2019

Stochastic Lipschitz Q-Learning

In an episodic Markov Decision Process (MDP) problem, an online algorith...
12/07/2019

No-Regret Exploration in Goal-Oriented Reinforcement Learning

Many popular reinforcement learning problems (e.g., navigation in a maze...
07/27/2020

Fast active learning for pure exploration in reinforcement learning

Realistic environments often provide agents with very limited feedback. ...

Please sign up or login with your details

Forgot password? Click here to reset