EMI: Exploration with Mutual Information Maximizing State and Action Embeddings

10/02/2018
by   HyoungSeok Kim, et al.
6

Policy optimization struggles when the reward feedback signal is very sparse and essentially becomes a random search algorithm until the agent accidentally stumbles upon a rewarding or the goal state. Recent works utilize intrinsic motivation to guide the exploration via generative models, predictive forward models, or more ad-hoc measures of surprise. We propose EMI, which is an exploration method that constructs embedding representation of states and actions that does not rely on generative decoding of the full observation but extracts predictive signals that can be used to guide exploration based on forward prediction in the representation space. Our experiments show the state of the art performance on challenging locomotion task with continuous control and on image-based exploration tasks with discrete actions on Atari.

READ FULL TEXT

page 2

page 7

page 8

page 12

research
10/11/2018

Empowerment-driven Exploration using Mutual Information Estimation

Exploration is a difficult challenge in reinforcement learning and is of...
research
08/25/2023

Go Beyond Imagination: Maximizing Episodic Reachability with World Models

Efficient exploration is a challenging topic in reinforcement learning, ...
research
11/19/2019

Implicit Generative Modeling for Efficient Exploration

Efficient exploration remains a challenging problem in reinforcement lea...
research
10/05/2020

Latent World Models For Intrinsically Motivated Exploration

In this work we consider partially observable environments with sparse r...
research
05/20/2021

Don't Do What Doesn't Matter: Intrinsic Motivation with Action Usefulness

Sparse rewards are double-edged training signals in reinforcement learni...
research
06/19/2019

QXplore: Q-learning Exploration by Maximizing Temporal Difference Error

A major challenge in reinforcement learning for continuous state-action ...
research
12/26/2020

Locally Persistent Exploration in Continuous Control Tasks with Sparse Rewards

A major challenge in reinforcement learning is the design of exploration...

Please sign up or login with your details

Forgot password? Click here to reset