Layered State Discovery for Incremental Autonomous Exploration

02/07/2023
∙
by   Liyu Chen, et al.
∙
0
∙

We study the autonomous exploration (AX) problem proposed by Lim Auer (2012). In this setting, the objective is to discover a set of Ïĩ-optimal policies reaching a set ð’Ū_L^→ of incrementally L-controllable states. We introduce a novel layered decomposition of the set of incrementally L-controllable states that is based on the iterative application of a state-expansion operator. We leverage these results to design Layered Autonomous Exploration (LAE), a novel algorithm for AX that attains a sample complexity of 𝒊Ėƒ(LS^→_L(1+Ïĩ)Γ_L(1+Ïĩ) A ln^12(S^→_L(1+Ïĩ))/Ïĩ^2), where S^→_L(1+Ïĩ) is the number of states that are incrementally L(1+Ïĩ)-controllable, A is the number of actions, and Γ_L(1+Ïĩ) is the branching factor of the transitions over such states. LAE improves over the algorithm of Tarbouriech et al. (2020a) by a factor of L^2 and it is the first algorithm for AX that works in a countably-infinite state space. Moreover, we show that, under a certain identifiability assumption, LAE achieves minimax-optimal sample complexity of 𝒊Ėƒ(LS^→_LAln^12(S^→_L)/Ïĩ^2), outperforming existing algorithms and matching for the first time the lower bound proved by Cai et al. (2022) up to logarithmic factors.

READ FULL TEXT

page 1

page 2

page 3

page 4

∙ 05/22/2022

Near-Optimal Algorithms for Autonomous Exploration and Multi-Goal Stochastic Shortest Path

We revisit the incremental autonomous exploration problem proposed by Li...
∙ 12/29/2020

Improved Sample Complexity for Incremental Autonomous Exploration in MDPs

We investigate the exploration of an unknown environment when no reward ...
∙ 04/24/2019

Stochastic Lipschitz Q-Learning

In an episodic Markov Decision Process (MDP) problem, an online algorith...
∙ 03/16/2017

Minimax Regret Bounds for Reinforcement Learning

We consider the problem of provably optimal exploration in reinforcement...
∙ 04/07/2020

Two Results on Layered Pathwidth and Linear Layouts

Layered pathwidth is a new graph parameter studied by Bannister et al (2...
∙ 09/28/2020

Is Reinforcement Learning More Difficult Than Bandits? A Near-optimal Algorithm Escaping the Curse of Horizon

Episodic reinforcement learning and contextual bandits are two widely st...
∙ 06/19/2021

Nearly Minimax Optimal Adversarial Imitation Learning with Known and Unknown Transitions

This paper is dedicated to designing provably efficient adversarial imit...

Please sign up or login with your details

Forgot password? Click here to reset