Fast exploration and learning of latent graphs with aliased observations

03/13/2023
by   Miguel Lázaro-Gredilla, et al.
0

Consider this scenario: an agent navigates a latent graph by performing actions that take it from one node to another. The chosen action determines the probability distribution over the next visited node. At each node, the agent receives an observation, but this observation is not unique, so it does not identify the node, making the problem aliased. The purpose of this work is to provide a policy that approximately maximizes exploration efficiency (i.e., how well the graph is recovered for a given exploration budget). In the unaliased case, we show improved performance w.r.t. state-of-the-art reinforcement learning baselines. For the aliased case we are not aware of suitable baselines and instead show faster recovery w.r.t. a random policy for a wide variety of topologies, and exponentially faster recovery than a random policy for challenging topologies. We dub the algorithm eFeX (from eFficient eXploration).

READ FULL TEXT

page 10

page 13

research
11/13/2019

Kinematic State Abstraction and Provably Efficient Rich-Observation Reinforcement Learning

We present an algorithm, HOMER, for exploration and reinforcement learni...
research
03/16/2023

Conditionally Optimistic Exploration for Cooperative Deep Multi-Agent Reinforcement Learning

Efficient exploration is critical in cooperative deep Multi-Agent Reinfo...
research
05/13/2019

Learning and Exploiting Multiple Subgoals for Fast Exploration in Hierarchical Reinforcement Learning

Hierarchical Reinforcement Learning (HRL) exploits temporally extended a...
research
07/05/2018

Goal-oriented Trajectories for Efficient Exploration

Exploration is a difficult challenge in reinforcement learning and even ...
research
02/27/2010

Learning from Logged Implicit Exploration Data

We provide a sound and consistent foundation for the use of nonrandom ex...
research
09/26/2020

SEMI: Self-supervised Exploration via Multisensory Incongruity

Efficient exploration is a long-standing problem in reinforcement learni...
research
04/17/2019

PLOTS: Procedure Learning from Observations using Subtask Structure

In many cases an intelligent agent may want to learn how to mimic a sing...

Please sign up or login with your details

Forgot password? Click here to reset