Identifying Reusable Macros for Efficient Exploration via Policy Compression

by   Francisco M. Garcia, et al.

Reinforcement Learning agents often need to solve not a single task, but several tasks pertaining to a same domain; in particular, each task corresponds to an MDP drawn from a family of related MDPs (a domain). An agent learning in this setting should be able exploit policies it has learned in the past, for a given set of sample tasks, in order to more rapidly acquire policies for novel tasks. Consider, for instance, a navigation problem where an agent may have to learn to navigate different (but related) mazes. Even though these correspond to distinct tasks (since the goal and starting locations of the agent may change, as well as the maze configuration itself), their solutions do share common properties---e.g. in order to reach distant areas of the maze, an agent should not move in circles. After an agent has learned to solve a few sample tasks, it may be possible to leverage the acquired experience to facilitate solving novel tasks from the same domain. Our work is motivated by the observation that trajectory samples from optimal policies for tasks belonging to a common domain, often reveal underlying useful patterns for solving novel tasks. We propose an optimization objective that characterizes the problem of learning reusable temporally extended actions (macros). We introduce a computationally tractable surrogate objective that is equivalent to finding macros that allow for maximal compression of a given set of sampled trajectories. We develop a compression-based approach for obtaining such macros and propose an exploration strategy that takes advantage of them. We show that meaningful behavioral patterns can be identified from sample policies over discrete and continuous action spaces, and present evidence that the proposed exploration strategy improves learning time on novel tasks.


Learning Reusable Options for Multi-Task Reinforcement Learning

Reinforcement learning (RL) has become an increasingly active area of re...

A Meta-MDP Approach to Exploration for Lifelong Reinforcement Learning

In this paper we consider the problem of how a reinforcement learning ag...

Task-agnostic Exploration in Reinforcement Learning

Efficient exploration is one of the main challenges in reinforcement lea...

Maximum Entropy Gain Exploration for Long Horizon Multi-goal Reinforcement Learning

What goals should a multi-goal reinforcement learning agent pursue durin...

Universal Psychometrics Tasks: difficulty, composition and decomposition

This note revisits the concepts of task and difficulty. The notion of co...

Learn Dynamic-Aware State Embedding for Transfer Learning

Transfer reinforcement learning aims to improve the sample efficiency of...

Active Information Acquisition

We propose a general framework for sequential and dynamic acquisition of...

Please sign up or login with your details

Forgot password? Click here to reset