Tolerance-Guided Policy Learning for Adaptable and Transferrable Delicate Industrial Insertion

by   Boshen Niu*, et al.

Policy learning for delicate industrial insertion tasks (e.g., PC board assembly) is challenging. This paper considers two major problems: how to learn a diversified policy (instead of just one average policy) that can efficiently handle different workpieces with minimum amount of training data, and how to handle defects of workpieces during insertion. To address the problems, we propose tolerance-guided policy learning. To encourage transferability of the learned policy to different workpieces, we add a task embedding to the policy's input space using the insertion tolerance. Then we train the policy using generative adversarial imitation learning with reward shaping (RS-GAIL) on a variety of representative situations. To encourage adaptability of the learned policy to handle defects, we build a probabilistic inference model that can output the best inserting pose based on failed insertions using the tolerance model. The best inserting pose is then used as a reference to the learned policy. This proposed method is validated on a sequence of IC socket insertion tasks in simulation. The results show that 1) RS-GAIL can efficiently learn optimal policies under sparse rewards; 2) the tolerance embedding can enhance the transferability of the learned policy; 3) the probabilistic inference makes the policy robust to defects on the workpieces.


page 1

page 2

page 3

page 4


A Composable Framework for Policy Design, Learning, and Transfer Toward Safe and Efficient Industrial Insertion

Delicate industrial insertion tasks (e.g., PC board assembly) remain cha...

Generative Adversarial Self-Imitation Learning

This paper explores a simple regularizer for reinforcement learning by p...

Multi-Task Imitation Learning for Linear Dynamical Systems

We study representation learning for efficient imitation learning over l...

Learning Self-Imitating Diverse Policies

Deep reinforcement learning algorithms, including policy gradient method...

Goal-Conditioned Imitation Learning using Score-based Diffusion Policies

We propose a new policy representation based on score-based diffusion mo...

Guided Imitation of Task and Motion Planning

While modern policy optimization methods can do complex manipulation fro...

Please sign up or login with your details

Forgot password? Click here to reset