Efficient Exploration with Self-Imitation Learning via Trajectory-Conditioned Policy

07/24/2019
by   Yijie Guo, et al.
1

This paper proposes a method for learning a trajectory-conditioned policy to imitate diverse demonstrations from the agent's own past experiences. We demonstrate that such self-imitation drives exploration in diverse directions and increases the chance of finding a globally optimal solution in reinforcement learning problems, especially when the reward is sparse and deceptive. Our method significantly outperforms existing self-imitation learning and count-based exploration methods on various sparse-reward reinforcement learning tasks with local optima. In particular, we report a state-of-the-art score of more than 25,000 points on Montezuma's Revenge without using expert demonstrations or resetting to arbitrary states.

READ FULL TEXT

page 1

page 6

page 7

page 8

page 14

page 15

research
11/26/2020

Episodic Self-Imitation Learning with Hindsight

Episodic self-imitation learning, a novel self-imitation algorithm with ...
research
11/30/2022

Towards Improving Exploration in Self-Imitation Learning using Intrinsic Motivation

Reinforcement Learning has emerged as a strong alternative to solve opti...
research
11/05/2018

Contingency-Aware Exploration in Reinforcement Learning

This paper investigates whether learning contingency-awareness and contr...
research
07/23/2020

Bridging the Imitation Gap by Adaptive Insubordination

Why do agents often obtain better reinforcement learning policies when i...
research
06/14/2023

Curricular Subgoals for Inverse Reinforcement Learning

Inverse Reinforcement Learning (IRL) aims to reconstruct the reward func...
research
01/17/2019

Amplifying the Imitation Effect for Reinforcement Learning of UCAV's Mission Execution

This paper proposes a new reinforcement learning (RL) algorithm that enh...
research
11/25/2020

Diluted Near-Optimal Expert Demonstrations for Guiding Dialogue Stochastic Policy Optimisation

A learning dialogue agent can infer its behaviour from interactions with...

Please sign up or login with your details

Forgot password? Click here to reset