Multi-Modal Fusion in Contact-Rich Precise Tasks via Hierarchical Policy Learning

by   Piaopiao Jin, et al.

Combined visual and force feedback play an essential role in contact-rich robotic manipulation tasks. Current methods focus on developing the feedback control around a single modality while underrating the synergy of the sensors. Fusing different sensor modalities is necessary but remains challenging. A key challenge is to achieve an effective multi-modal and generalized control scheme to novel objects with precision. This paper proposes a practical multi-modal sensor fusion mechanism using hierarchical policy learning. To begin with, we use a self-supervised encoder that extracts multi-view visual features and a hybrid motion/force controller that regulates force behaviors. Next, the multi-modality fusion is simplified by hierarchical integration of the vision, force, and proprioceptive data in the reinforcement learning (RL) algorithm. Moreover, with hierarchical policy learning, the control scheme can exploit the visual feedback limits and explore the contribution of individual modality in precise tasks. Experiments indicate that robots with the control scheme could assemble objects with 0.25mm clearance in simulation. The system could be generalized to widely varied initial configurations and new shapes. Experiments validate that the simulated system can be robustly transferred to reality without fine-tuning.


page 1

page 2

page 3

page 4

page 5

page 6

page 7

page 8


Toward Fine Contact Interactions: Learning to Control Normal Contact Force with Limited Information

Dexterous manipulation of objects through fine control of physical conta...

Symmetric Models for Visual Force Policy Learning

While it is generally acknowledged that force feedback is beneficial to ...

Leveraging Multi-modal Sensing for Robotic Insertion Tasks in R D Laboratories

Performing a large volume of experiments in Chemistry labs creates repet...

Rotating Objects via In-Hand Pivoting using Vision, Force and Touch

We propose a robotic manipulation system that can pivot objects on a sur...

Understanding Multi-Modal Perception Using Behavioral Cloning for Peg-In-a-Hole Insertion Tasks

One of the main challenges in peg-in-a-hole (PiH) insertion tasks is in ...

Detect, Reject, Correct: Crossmodal Compensation of Corrupted Sensors

Using sensor data from multiple modalities presents an opportunity to en...

Touch-based Curiosity for Sparse-Reward Tasks

Robots in many real-world settings have access to force/torque sensors i...

Please sign up or login with your details

Forgot password? Click here to reset