MMRDN: Consistent Representation for Multi-View Manipulation Relationship Detection in Object-Stacked Scenes

04/25/2023
by   Han Wang, et al.
0

Manipulation relationship detection (MRD) aims to guide the robot to grasp objects in the right order, which is important to ensure the safety and reliability of grasping in object stacked scenes. Previous works infer manipulation relationship by deep neural network trained with data collected from a predefined view, which has limitation in visual dislocation in unstructured environments. Multi-view data provide more comprehensive information in space, while a challenge of multi-view MRD is domain shift. In this paper, we propose a novel multi-view fusion framework, namely multi-view MRD network (MMRDN), which is trained by 2D and 3D multi-view data. We project the 2D data from different views into a common hidden space and fit the embeddings with a set of Von-Mises-Fisher distributions to learn the consistent representations. Besides, taking advantage of position information within the 3D data, we select a set of K Maximum Vertical Neighbors (KMVN) points from the point cloud of each object pair, which encodes the relative position of these two objects. Finally, the features of multi-view 2D and 3D data are concatenated to predict the pairwise relationship of objects. Experimental results on the challenging REGRAD dataset show that MMRDN outperforms the state-of-the-art methods in multi-view MRD tasks. The results also demonstrate that our model trained by synthetic data is capable to transfer to real-world scenarios.

READ FULL TEXT

page 1

page 3

page 4

page 6

research
07/19/2022

DUQIM-Net: Probabilistic Object Hierarchy Representation for Multi-View Manipulation

Object manipulation in cluttered scenes is a difficult and important pro...
research
02/05/2023

Multi-View Masked World Models for Visual Robotic Manipulation

Visual robotic manipulation research and applications often use multiple...
research
10/23/2018

Hierarchy-Dependent Cross-Platform Multi-View Feature Learning for Venue Category Prediction

In this work, we focus on visual venue category prediction, which can fa...
research
04/05/2022

Multi-View Transformer for 3D Visual Grounding

The 3D visual grounding task aims to ground a natural language descripti...
research
08/31/2022

Scatter Points in Space: 3D Detection from Multi-view Monocular Images

3D object detection from monocular image(s) is a challenging and long-st...
research
12/07/2021

Voxelized 3D Feature Aggregation for Multiview Detection

Multi-view detection incorporates multiple camera views to alleviate occ...

Please sign up or login with your details

Forgot password? Click here to reset