Model-Based Imitation Learning Using Entropy Regularization of Model and Policy

06/21/2022
by   Eiji Uchibe, et al.
0

Approaches based on generative adversarial networks for imitation learning are promising because they are sample efficient in terms of expert demonstrations. However, training a generator requires many interactions with the actual environment because model-free reinforcement learning is adopted to update a policy. To improve the sample efficiency using model-based reinforcement learning, we propose model-based Entropy-Regularized Imitation Learning (MB-ERIL) under the entropy-regularized Markov decision process to reduce the number of interactions with the actual environment. MB-ERIL uses two discriminators. A policy discriminator distinguishes the actions generated by a robot from expert ones, and a model discriminator distinguishes the counterfactual state transitions generated by the model from the actual ones. We derive the structured discriminators so that the learning of the policy and the model is efficient. Computer simulations and real robot experiments show that MB-ERIL achieves a competitive performance and significantly improves the sample efficiency compared to baseline methods.

READ FULL TEXT

page 1

page 6

page 7

research
10/22/2020

Error Bounds of Imitating Policies and Environments

Imitation learning trains a policy by mimicking expert demonstrations. V...
research
08/17/2023

Regularizing Adversarial Imitation Learning Using Causal Invariance

Imitation learning methods are used to infer a policy in a Markov decisi...
research
08/17/2020

Imitation learning based on entropy-regularized forward and inverse reinforcement learning

This paper proposes Entropy-Regularized Imitation Learning (ERIL), which...
research
08/04/2022

Backward Imitation and Forward Reinforcement Learning via Bi-directional Model Rollouts

Traditional model-based reinforcement learning (RL) methods generate for...
research
04/03/2021

No Need for Interactions: Robust Model-Based Imitation Learning using Neural ODE

Interactions with either environments or expert policies during training...
research
06/11/2020

PAC Bounds for Imitation and Model-based Batch Learning of Contextual Markov Decision Processes

We consider the problem of batch multi-task reinforcement learning with ...
research
05/27/2021

Generative Adversarial Imitation Learning for Empathy-based AI

Generative adversarial imitation learning (GAIL) is a model-free algorit...

Please sign up or login with your details

Forgot password? Click here to reset