Learning Adaptive Exploration Strategies in Dynamic Environments Through Informed Policy Regularization

We study the problem of learning exploration-exploitation strategies that effectively adapt to dynamic environments, where the task may change over time. While RNN-based policies could in principle represent such strategies, in practice their training time is prohibitive and the learning process often converges to poor solutions. In this paper, we consider the case where the agent has access to a description of the task (e.g., a task id or task parameters) at training time, but not at test time. We propose a novel algorithm that regularizes the training of an RNN-based policy using informed policies trained to maximize the reward in each task. This dramatically reduces the sample complexity of training RNN-based policies, without losing their representational power. As a result, our method learns exploration strategies that efficiently balance between gathering information about the unknown and changing task and maximizing the reward over time. We test the performance of our algorithm in a variety of environments where tasks may vary within each episode.

READ FULL TEXT
research
03/15/2019

Adaptive Variance for Changing Sparse-Reward Environments

Robots that are trained to perform a task in a fixed environment often f...
research
11/11/2020

Proximal Policy Optimization via Enhanced Exploration Efficiency

Proximal policy optimization (PPO) algorithm is a deep reinforcement lea...
research
06/20/2023

Informed POMDP: Leveraging Additional Information in Model-Based RL

In this work, we generalize the problem of learning through interaction ...
research
03/05/2019

Learning Exploration Policies for Navigation

Numerous past works have tackled the problem of task-driven navigation. ...
research
05/18/2021

Meta-Reinforcement Learning by Tracking Task Non-stationarity

Many real-world domains are subject to a structured non-stationarity whi...
research
03/08/2023

Learning Exploration Strategies to Solve Real-World Marble Runs

Tasks involving locally unstable or discontinuous dynamics (such as bifu...
research
03/09/2023

Reward Informed Dreamer for Task Generalization in Reinforcement Learning

A long-standing goal of reinforcement learning is that algorithms can le...

Please sign up or login with your details

Forgot password? Click here to reset