Layered State Discovery for Incremental Autonomous Exploration

02/07/2023
by   Liyu Chen, et al.
0

We study the autonomous exploration (AX) problem proposed by Lim Auer (2012). In this setting, the objective is to discover a set of ϵ-optimal policies reaching a set 𝒮_L^→ of incrementally L-controllable states. We introduce a novel layered decomposition of the set of incrementally L-controllable states that is based on the iterative application of a state-expansion operator. We leverage these results to design Layered Autonomous Exploration (LAE), a novel algorithm for AX that attains a sample complexity of 𝒪̃(LS^→_L(1+ϵ)Γ_L(1+ϵ) A ln^12(S^→_L(1+ϵ))/ϵ^2), where S^→_L(1+ϵ) is the number of states that are incrementally L(1+ϵ)-controllable, A is the number of actions, and Γ_L(1+ϵ) is the branching factor of the transitions over such states. LAE improves over the algorithm of Tarbouriech et al. (2020a) by a factor of L^2 and it is the first algorithm for AX that works in a countably-infinite state space. Moreover, we show that, under a certain identifiability assumption, LAE achieves minimax-optimal sample complexity of 𝒪̃(LS^→_LAln^12(S^→_L)/ϵ^2), outperforming existing algorithms and matching for the first time the lower bound proved by Cai et al. (2022) up to logarithmic factors.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/22/2022

Near-Optimal Algorithms for Autonomous Exploration and Multi-Goal Stochastic Shortest Path

We revisit the incremental autonomous exploration problem proposed by Li...
research
12/29/2020

Improved Sample Complexity for Incremental Autonomous Exploration in MDPs

We investigate the exploration of an unknown environment when no reward ...
research
04/24/2019

Stochastic Lipschitz Q-Learning

In an episodic Markov Decision Process (MDP) problem, an online algorith...
research
03/16/2017

Minimax Regret Bounds for Reinforcement Learning

We consider the problem of provably optimal exploration in reinforcement...
research
04/07/2020

Two Results on Layered Pathwidth and Linear Layouts

Layered pathwidth is a new graph parameter studied by Bannister et al (2...
research
09/28/2020

Is Reinforcement Learning More Difficult Than Bandits? A Near-optimal Algorithm Escaping the Curse of Horizon

Episodic reinforcement learning and contextual bandits are two widely st...
research
06/19/2021

Nearly Minimax Optimal Adversarial Imitation Learning with Known and Unknown Transitions

This paper is dedicated to designing provably efficient adversarial imit...

Please sign up or login with your details

Forgot password? Click here to reset