Adaptive Multi-Goal Exploration

by   Jean Tarbouriech, et al.

We introduce a generic strategy for provably efficient multi-goal exploration. It relies on AdaGoal, a novel goal selection scheme that is based on a simple constrained optimization problem, which adaptively targets goal states that are neither too difficult nor too easy to reach according to the agent's current knowledge. We show how AdaGoal can be used to tackle the objective of learning an ϵ-optimal goal-conditioned policy for all the goal states that are reachable within L steps in expectation from a reference state s_0 in a reward-free Markov decision process. In the tabular case with S states and A actions, our algorithm requires Õ(L^3 S A ϵ^-2) exploration steps, which is nearly minimax optimal. We also readily instantiate AdaGoal in linear mixture Markov decision processes, which yields the first goal-oriented PAC guarantee with linear function approximation. Beyond its strong theoretical guarantees, AdaGoal is anchored in the high-level algorithmic structure of existing methods for goal-conditioned deep reinforcement learning.


Markov Abstractions for PAC Reinforcement Learning in Non-Markov Decision Processes

Our work aims at developing reinforcement learning algorithms that do no...

On Bellman's principle of optimality and Reinforcement learning for safety-constrained Markov decision process

We study optimality for the safety-constrained Markov decision process w...

Reward is not Necessary: How to Create a Compositional Self-Preserving Agent for Life-Long Learning

We introduce a physiological model-based agent as proof-of-principle tha...

Improved Sample Complexity for Incremental Autonomous Exploration in MDPs

We investigate the exploration of an unknown environment when no reward ...

Goal-oriented inference of environment from redundant observations

The agent learns to organize decision behavior to achieve a behavioral g...

InfoBot: Transfer and Exploration via the Information Bottleneck

A central challenge in reinforcement learning is discovering effective p...

Please sign up or login with your details

Forgot password? Click here to reset