Improving Learning from Demonstrations by Learning from Experience

by   Haofeng Liu, et al.

How to make imitation learning more general when demonstrations are relatively limited has been a persistent problem in reinforcement learning (RL). Poor demonstrations lead to narrow and biased date distribution, non-Markovian human expert demonstration makes it difficult for the agent to learn, and over-reliance on sub-optimal trajectories can make it hard for the agent to improve its performance. To solve these problems we propose a new algorithm named TD3fG that can smoothly transition from learning from experts to learning from experience. Our algorithm achieves good performance in the MUJOCO environment with limited and sub-optimal demonstrations. We use behavior cloning to train the network as a reference action generator and utilize it in terms of both loss function and exploration noise. This innovation can help agents extract a priori knowledge from demonstrations while reducing the detrimental effects of the poor Markovian properties of the demonstrations. It has a better performance compared to the BC+ fine-tuning and DDPGfD approach, especially when the demonstrations are relatively limited. We call our method TD3fG meaning TD3 from a generator.


page 1

page 2

page 3

page 4


Learning Complicated Manipulation Skills via Deterministic Policy with Limited Demonstrations

Combined with demonstrations, deep reinforcement learning can efficientl...

Self-Imitation Learning from Demonstrations

Despite the numerous breakthroughs achieved with Reinforcement Learning ...

Pedagogical Demonstrations and Pragmatic Learning in Artificial Tutor-Learner Interactions

When demonstrating a task, human tutors pedagogically modify their behav...

Bayesian Experience Reuse for Learning from Multiple Demonstrators

Learning from demonstrations (LfD) improves the exploration efficiency o...

Making Efficient Use of Demonstrations to Solve Hard Exploration Problems

This paper introduces R2D3, an agent that makes efficient use of demonst...

A Closer Look at Advantage-Filtered Behavioral Cloning in High-Noise Datasets

Recent Offline Reinforcement Learning methods have succeeded in learning...

Please sign up or login with your details

Forgot password? Click here to reset