Reinforcement Learning from Imperfect Demonstrations

02/14/2018
by   Yang Gao, et al.
0

Robust real-world learning should benefit from both demonstrations and interactions with the environment. Current approaches to learning from demonstration and reward perform supervised learning on expert demonstration data and use reinforcement learning to further improve performance based on the reward received from the environment. These tasks have divergent losses which are difficult to jointly optimize and such methods can be very sensitive to noisy demonstrations. We propose a unified reinforcement learning algorithm, Normalized Actor-Critic (NAC), that effectively normalizes the Q-function, reducing the Q-values of actions unseen in the demonstration data. NAC learns an initial policy network from demonstrations and refines the policy in the environment, surpassing the demonstrator's performance. Crucially, both learning from demonstration and interactive refinement use the same objective, unlike prior approaches that combine distinct supervised and reinforcement losses. This makes NAC robust to suboptimal demonstration data since the method is not forced to mimic all of the examples in the dataset. We show that our unified reinforcement learning algorithm can learn robustly and outperform existing baselines when evaluated on several realistic driving games.

READ FULL TEXT
research
11/02/2020

Shaping Rewards for Reinforcement Learning with Imperfect Demonstrations using Generative Models

The potential benefits of model-free reinforcement learning to real robo...
research
10/09/2019

Integrating Behavior Cloning and Reinforcement Learning for Improved Performance in Sparse Reward Environments

This paper investigates how to efficiently transition and update policie...
research
02/18/2021

Learning Memory-Dependent Continuous Control from Demonstrations

Efficient exploration has presented a long-standing challenge in reinfor...
research
04/12/2017

Deep Q-learning from Demonstrations

Deep reinforcement learning (RL) has achieved several high profile succe...
research
06/14/2020

Reinforcement Learning with Supervision from Noisy Demonstrations

Reinforcement learning has achieved great success in various application...
research
10/17/2020

Learning from Suboptimal Demonstration via Self-Supervised Reward Regression

Learning from Demonstration (LfD) seeks to democratize robotics by enabl...
research
10/16/2017

Gradient-free Policy Architecture Search and Adaptation

We develop a method for policy architecture search and adaptation via gr...

Please sign up or login with your details

Forgot password? Click here to reset