Learning Self-Correctable Policies and Value Functions from Demonstrations with Negative Sampling

07/12/2019
by   Yuping Luo, et al.
2

Imitation learning, followed by reinforcement learning algorithms, is a promising paradigm to solve complex control tasks sample-efficiently. However, learning from demonstrations often suffers from the covariate shift problem, which results in cascading errors of the learned policy. We introduce a notion of conservatively-extrapolated value functions, which provably lead to policies with self-correction. We design an algorithm Value Iteration with Negative Sampling (VINS) that practically learns such value functions with conservative extrapolation. We show that VINS can correct mistakes of the behavioral cloning policy on simulated robotics benchmark tasks. We also propose the algorithm of using VINS to initialize a reinforcement learning algorithm, which is shown to outperform significantly prior works in sample efficiency.

READ FULL TEXT
research
12/07/2022

Accelerating Self-Imitation Learning from Demonstrations via Policy Constraints and Q-Ensemble

Deep reinforcement learning (DRL) provides a new way to generate robot c...
research
04/05/2022

Jump-Start Reinforcement Learning

Reinforcement learning (RL) provides a theoretical framework for continu...
research
05/24/2023

Replicable Reinforcement Learning

The replicability crisis in the social, behavioral, and data sciences ha...
research
12/28/2022

On Pathologies in KL-Regularized Reinforcement Learning from Expert Demonstrations

KL-regularized reinforcement learning from expert demonstrations has pro...
research
02/16/2021

SCAPE: Learning Stiffness Control from Augmented Position Control Experiences

We introduce a sample-efficient method for learning state-dependent stif...
research
04/12/2023

Exploiting Symmetry and Heuristic Demonstrations in Off-policy Reinforcement Learning for Robotic Manipulation

Reinforcement learning demonstrates significant potential in automatical...
research
01/07/2019

Learning the optimal state-feedback via supervised imitation learning

Imitation learning is a control design paradigm that seeks to learn a co...

Please sign up or login with your details

Forgot password? Click here to reset