DisCoRL: Continual Reinforcement Learning via Policy Distillation

07/11/2019
by   René Traoré, et al.
6

In multi-task reinforcement learning there are two main challenges: at training time, the ability to learn different policies with a single model; at test time, inferring which of those policies applying without an external signal. In the case of continual reinforcement learning a third challenge arises: learning tasks sequentially without forgetting the previous ones. In this paper, we tackle these challenges by proposing DisCoRL, an approach combining state representation learning and policy distillation. We experiment on a sequence of three simulated 2D navigation tasks with a 3 wheel omni-directional robot. Moreover, we tested our approach's robustness by transferring the final policy into a real life setting. The policy can solve all tasks and automatically infer which one to run.

READ FULL TEXT

page 2

page 6

research
06/11/2019

Continual Reinforcement Learning deployed in Real-life using Policy Distillation and Sim2Real Transfer

We focus on the problem of teaching a robot to solve tasks presented seq...
research
11/29/2022

The Surprising Effectiveness of Latent World Models for Continual Reinforcement Learning

We study the use of model-based reinforcement learning methods, in parti...
research
06/05/2021

Same State, Different Task: Continual Reinforcement Learning without Interference

Continual Learning (CL) considers the problem of training an agent seque...
research
07/09/2021

Behavior Self-Organization Supports Task Inference for Continual Robot Learning

Recent advances in robot learning have enabled robots to become increasi...
research
10/05/2022

Neural Distillation as a State Representation Bottleneck in Reinforcement Learning

Learning a good state representation is a critical skill when dealing wi...
research
10/21/2022

Continual Reinforcement Learning with Group Symmetries

Continual reinforcement learning (RL) aims to learn a sequence of tasks ...
research
01/30/2023

Transferring Multiple Policies to Hotstart Reinforcement Learning in an Air Compressor Management Problem

Many instances of similar or almost-identical industrial machines or too...

Please sign up or login with your details

Forgot password? Click here to reset