Improving Learning from Demonstrations by Learning from Experience

11/16/2021
by   Haofeng Liu, et al.
0

How to make imitation learning more general when demonstrations are relatively limited has been a persistent problem in reinforcement learning (RL). Poor demonstrations lead to narrow and biased date distribution, non-Markovian human expert demonstration makes it difficult for the agent to learn, and over-reliance on sub-optimal trajectories can make it hard for the agent to improve its performance. To solve these problems we propose a new algorithm named TD3fG that can smoothly transition from learning from experts to learning from experience. Our algorithm achieves good performance in the MUJOCO environment with limited and sub-optimal demonstrations. We use behavior cloning to train the network as a reference action generator and utilize it in terms of both loss function and exploration noise. This innovation can help agents extract a priori knowledge from demonstrations while reducing the detrimental effects of the poor Markovian properties of the demonstrations. It has a better performance compared to the BC+ fine-tuning and DDPGfD approach, especially when the demonstrations are relatively limited. We call our method TD3fG meaning TD3 from a generator.

READ FULL TEXT

page 1

page 2

page 3

page 4

03/29/2023

Learning Complicated Manipulation Skills via Deterministic Policy with Limited Demonstrations

Combined with demonstrations, deep reinforcement learning can efficientl...
03/21/2022

Self-Imitation Learning from Demonstrations

Despite the numerous breakthroughs achieved with Reinforcement Learning ...
02/28/2022

Pedagogical Demonstrations and Pragmatic Learning in Artificial Tutor-Learner Interactions

When demonstrating a task, human tutors pedagogically modify their behav...
06/10/2020

Bayesian Experience Reuse for Learning from Multiple Demonstrators

Learning from demonstrations (LfD) improves the exploration efficiency o...
09/03/2019

Making Efficient Use of Demonstrations to Solve Hard Exploration Problems

This paper introduces R2D3, an agent that makes efficient use of demonst...
10/10/2021

A Closer Look at Advantage-Filtered Behavioral Cloning in High-Noise Datasets

Recent Offline Reinforcement Learning methods have succeeded in learning...

Please sign up or login with your details

Forgot password? Click here to reset