DROID: Minimizing the Reality Gap using Single-Shot Human Demonstration

02/22/2021
by   Ya-Yen Tsai, et al.
0

Reinforcement learning (RL) has demonstrated great success in the past several years. However, most of the scenarios focus on simulated environments. One of the main challenges of transferring the policy learned in a simulated environment to real world, is the discrepancy between the dynamics of the two environments. In prior works, Domain Randomization (DR) has been used to address the reality gap for both robotic locomotion and manipulation tasks. In this paper, we propose Domain Randomization Optimization IDentification (DROID), a novel framework to exploit single-shot human demonstration for identifying the simulator's distribution of dynamics parameters, and apply it to training a policy on a door opening task. Our results show that the proposed framework can identify the difference in dynamics between the simulated and the real worlds, and thus improve policy transfer by optimizing the simulator's randomization ranges. We further illustrate that based on these same identified parameters, our method can generalize the learned policy to different but related tasks.

READ FULL TEXT

page 1

page 4

research
03/28/2019

How to pick the domain randomization parameters for sim-to-real transfer of reinforcement learning policies?

Recently, reinforcement learning (RL) algorithms have demonstrated remar...
research
09/17/2021

Dropout's Dream Land: Generalization from Learned Simulators to Reality

A World Model is a generative model used to simulate an environment. Wor...
research
09/10/2021

Closing the Sim2Real Gap in Dynamic Cloth Manipulation

Cloth manipulation is a challenging task due to the many degrees of free...
research
05/29/2023

Privileged Knowledge Distillation for Sim-to-Real Policy Generalization

Reinforcement Learning (RL) has recently achieved remarkable success in ...
research
03/03/2020

Traversing the Reality Gap via Simulator Tuning

The large demand for simulated data has made the reality gap a problem o...
research
09/23/2022

Quantification before Selection: Active Dynamics Preference for Robust Reinforcement Learning

Training a robust policy is critical for policy deployment in real-world...
research
06/29/2022

Online vs. Offline Adaptive Domain Randomization Benchmark

Physics simulators have shown great promise for conveniently learning re...

Please sign up or login with your details

Forgot password? Click here to reset