Learning and Exploiting Multiple Subgoals for Fast Exploration in Hierarchical Reinforcement Learning

05/13/2019
by   Libo Xing, et al.
0

Hierarchical Reinforcement Learning (HRL) exploits temporally extended actions, or options, to make decisions from a higher-dimensional perspective to alleviate the sparse reward problem, one of the most challenging problems in reinforcement learning. The majority of existing HRL algorithms require either significant manual design with respect to the specific environment or enormous exploration to automatically learn options from data. To achieve fast exploration without using manual design, we devise a multi-goal HRL algorithm, consisting of a high-level policy Manager and a low-level policy Worker. The Manager provides the Worker multiple subgoals at each time step. Each subgoal corresponds to an option to control the environment. Although the agent may show some confusion at the beginning of training since it is guided by three diverse subgoals, the agent's behavior policy will quickly learn how to respond to multiple subgoals from the high-level controller on different occasions. By exploiting multiple subgoals, the exploration efficiency is significantly improved. We conduct experiments in Atari's Montezuma's Revenge environment, a well-known sparse reward environment, and in doing so achieve the same performance as state-of-the-art HRL methods with substantially reduced training time cost.

READ FULL TEXT
research
06/30/2023

Landmark Guided Active Exploration with Stable Low-level Policy Learning

Goal-conditioned hierarchical reinforcement learning (GCHRL) decomposes ...
research
11/02/2021

Learning to Explore by Reinforcement over High-Level Options

Autonomous 3D environment exploration is a fundamental task for various ...
research
12/24/2022

SHIRO: Soft Hierarchical Reinforcement Learning

Hierarchical Reinforcement Learning (HRL) algorithms have been demonstra...
research
10/03/2020

Disentangling causal effects for hierarchical reinforcement learning

Exploration and credit assignment under sparse rewards are still challen...
research
04/30/2023

Learning Achievement Structure for Structured Exploration in Domains with Sparse Reward

We propose Structured Exploration with Achievements (SEA), a multi-stage...
research
03/13/2023

Fast exploration and learning of latent graphs with aliased observations

Consider this scenario: an agent navigates a latent graph by performing ...
research
05/30/2023

Temporally Layered Architecture for Efficient Continuous Control

We present a temporally layered architecture (TLA) for temporally adapti...

Please sign up or login with your details

Forgot password? Click here to reset