Multi-Modality Task Cascade for 3D Object Detection

by   Jinhyung Park, et al.

Point clouds and RGB images are naturally complementary modalities for 3D visual understanding - the former provides sparse but accurate locations of points on objects, while the latter contains dense color and texture information. Despite this potential for close sensor fusion, many methods train two models in isolation and use simple feature concatenation to represent 3D sensor data. This separated training scheme results in potentially sub-optimal performance and prevents 3D tasks from being used to benefit 2D tasks that are often useful on their own. To provide a more integrated approach, we propose a novel Multi-Modality Task Cascade network (MTC-RCNN) that leverages 3D box proposals to improve 2D segmentation predictions, which are then used to further refine the 3D boxes. We show that including a 2D network between two stages of 3D modules significantly improves both 2D and 3D task performance. Moreover, to prevent the 3D module from over-relying on the overfitted 2D predictions, we propose a dual-head 2D segmentation training and inference scheme, allowing the 2nd 3D module to learn to interpret imperfect 2D segmentation predictions. Evaluating our model on the challenging SUN RGB-D dataset, we improve upon state-of-the-art results of both single modality and fusion networks by a large margin ($\textbf{+3.8}$ mAP@0.5). Code will be released $\href{}{\text{here.}}$


page 4

page 5

page 7

page 14

page 15


Complementary Random Masking for RGB-Thermal Semantic Segmentation

RGB-thermal semantic segmentation is one potential solution to achieve r...

PI-RCNN: An Efficient Multi-sensor 3D Object Detector with Point-based Attentive Cont-conv Fusion Module

LIDAR point clouds and RGB-images are both extremely essential for 3D ob...

DetMatch: Two Teachers are Better Than One for Joint 2D and 3D Semi-Supervised Object Detection

While numerous 3D detection works leverage the complementary relationshi...

Boosting 3D Object Detection via Object-Focused Image Fusion

3D object detection has achieved remarkable progress by taking point clo...

Accurate Monocular 3D Object Detection via Color-Embedded 3D Reconstruction for Autonomous Driving

In this paper, we propose a monocular 3D object detection framework in t...

Multi-interactive Feature Learning and a Full-time Multi-modality Benchmark for Image Fusion and Segmentation

Multi-modality image fusion and segmentation play a vital role in autono...

Bi-level Dynamic Learning for Jointly Multi-modality Image Fusion and Beyond

Recently, multi-modality scene perception tasks, e.g., image fusion and ...

Please sign up or login with your details

Forgot password? Click here to reset