Learning Exploration Strategies to Solve Real-World Marble Runs

03/08/2023
by   Alisa Allaire, et al.
0

Tasks involving locally unstable or discontinuous dynamics (such as bifurcations and collisions) remain challenging in robotics, because small variations in the environment can have a significant impact on task outcomes. For such tasks, learning a robust deterministic policy is difficult. We focus on structuring exploration with multiple stochastic policies based on a mixture of experts (MoE) policy representation that can be efficiently adapted. The MoE policy is composed of stochastic sub-policies that allow exploration of multiple distinct regions of the action space (or strategies) and a high-level selection policy to guide exploration towards the most promising regions. We develop a robot system to evaluate our approach in a real-world physical problem solving domain. After training the MoE policy in simulation, online learning in the real world demonstrates efficient adaptation within just a few dozen attempts, with a minimal sim2real gap. Our results confirm that representing multiple strategies promotes efficient adaptation in new environments and strategies learned under different dynamics can still provide useful information about where to look for good strategies.

READ FULL TEXT
research
03/12/2021

Domain Curiosity: Learning Efficient Data Collection Strategies for Domain Adaptation

Domain adaptation is a common problem in robotics, with applications suc...
research
06/02/2020

Learning Active Task-Oriented Exploration Policies for Bridging the Sim-to-Real Gap

Training robotic policies in simulation suffers from the sim-to-real gap...
research
05/20/2018

Learning Real-World Robot Policies by Dreaming

Learning to control robots directly based on images is a primary challen...
research
11/19/2019

MANGA: Method Agnostic Neural-policy Generalization and Adaptation

In this paper we target the problem of transferring policies across mult...
research
11/11/2019

MAME : Model-Agnostic Meta-Exploration

Meta-Reinforcement learning approaches aim to develop learning procedure...
research
03/02/2018

TACO: Learning Task Decomposition via Temporal Alignment for Control

Many advanced Learning from Demonstration (LfD) methods consider the dec...
research
05/06/2020

Learning Adaptive Exploration Strategies in Dynamic Environments Through Informed Policy Regularization

We study the problem of learning exploration-exploitation strategies tha...

Please sign up or login with your details

Forgot password? Click here to reset