Efficient Reinforcement Learning from Demonstration Using Local Ensemble and Reparameterization with Split and Merge of Expert Policies

05/23/2022
by   Yu Wang, et al.
0

The current work on reinforcement learning (RL) from demonstrations often assumes the demonstrations are samples from an optimal policy, an unrealistic assumption in practice. When demonstrations are generated by sub-optimal policies or have sparse state-action pairs, policy learned from sub-optimal demonstrations may mislead an agent with incorrect or non-local action decisions. We propose a new method called Local Ensemble and Reparameterization with Split and Merge of expert policies (LEARN-SAM) to improve efficiency and make better use of the sub-optimal demonstrations. First, LEARN-SAM employs a new concept, the lambda-function, based on a discrepancy measure between the current state to demonstrated states to "localize" the weights of the expert policies during learning. Second, LEARN-SAM employs a split-and-merge (SAM) mechanism by separating the helpful parts in each expert demonstration and regrouping them into new expert policies to use the demonstrations selectively. Both the lambda-function and SAM mechanism help boost the learning speed. Theoretically, we prove the invariant property of reparameterized policy before and after the SAM mechanism, providing theoretical guarantees for the convergence of the employed policy gradient method. We demonstrate the superiority of the LEARN-SAM method and its robustness with varying demonstration quality and sparsity in six experiments on complex continuous control problems of low to high dimensions, compared to existing methods on RL from demonstration.

READ FULL TEXT
research
06/17/2021

Learning from Demonstration without Demonstrations

State-of-the-art reinforcement learning (RL) algorithms suffer from high...
research
11/16/2021

Improving Learning from Demonstrations by Learning from Experience

How to make imitation learning more general when demonstrations are rela...
research
04/01/2020

Learning Sparse Rewarded Tasks from Sub-Optimal Demonstrations

Model-free deep reinforcement learning (RL) has demonstrated its superio...
research
05/04/2016

A Bayesian Approach to Policy Recognition and State Representation Learning

Learning from demonstration (LfD) is the process of building behavioral ...
research
10/10/2021

A Closer Look at Advantage-Filtered Behavioral Cloning in High-Noise Datasets

Recent Offline Reinforcement Learning methods have succeeded in learning...
research
09/26/2022

Enhanced Meta Reinforcement Learning using Demonstrations in Sparse Reward Environments

Meta reinforcement learning (Meta-RL) is an approach wherein the experie...
research
01/16/2019

ReNeg and Backseat Driver: Learning from Demonstration with Continuous Human Feedback

In autonomous vehicle (AV) control, allowing mistakes can be quite dange...

Please sign up or login with your details

Forgot password? Click here to reset