Learning a High-Precision Robotic Assembly Task Using Pose Estimation from Simulated Depth Images

by   Yuval Litvak, et al.

Most of industrial robotic assembly tasks today require fixed initial conditions for successful assembly. These constraints induce high production costs and low adaptability to new tasks. In this work we aim towards flexible and adaptable robotic assembly by using 3D CAD models for all parts to be assembled. We focus on a generic assembly task - the Siemens Innovation Challenge - in which a robot needs to assemble a gear-like mechanism with high precision into an operating system. To obtain the millimeter-accuracy required for this task and industrial settings alike, we use a depth camera mounted near the robot end-effector. We present a high-accuracy three-stage pose estimation pipeline based on deep convolutional neural networks, which includes detection, pose estimation, refinement, and handling of near- and full symmetries of parts. The networks are trained on simulated depth images by means to ensure successful transfer to the real robot. We obtain an average pose estimation error of 2.14 millimeters and 1.09 degree leading to 88.6 robotic assembly of randomly distributed parts. To the best of our knowledge, this is the first time that the Siemens Innovation Challenge is fully solved, opening up new possibilities for automated industrial assembly.


