MSMDFusion: Fusing LiDAR and Camera at Multiple Scales with Multi-Depth Seeds for 3D Object Detection

by   Yang Jiao, et al.

Fusing LiDAR and camera information is essential for achieving accurate and reliable 3D object detection in autonomous driving systems. However, this is challenging due to the difficulty of combining multi-granularity geometric and semantic features from two drastically different modalities. Recent approaches aim at exploring the semantic densities of camera features through lifting points in 2D camera images (referred to as seeds) into 3D space for fusion, and they can be roughly divided into 1) early fusion of raw points that aims at augmenting the 3D point cloud at the early input stage, and 2) late fusion of BEV (bird-eye view) maps that merges LiDAR and camera BEV features before the detection head. While both have their merits in enhancing the representation power of the combined features, this single-level fusion strategy is a suboptimal solution to the aforementioned challenge. Their major drawbacks are the inability to interact the multi-granularity semantic features from two distinct modalities sufficiently. To this end, we propose a novel framework that focuses on the multi-scale progressive interaction of the multi-granularity LiDAR and camera features. Our proposed method, abbreviated as MDMSFusion, achieves state-of-the-art results in 3D object detection, with 69.1 mAP and 71.8 NDS on nuScenes validation set, and 70.8 mAP and 73.2 NDS on nuScenes test set, which rank 1st and 2nd respectively among single-model non-ensemble approaches by the time of submission.


page 3

page 4


SemanticBEVFusion: Rethink LiDAR-Camera Fusion in Unified Bird's-Eye View Representation for 3D Object Detection

LiDAR and camera are two essential sensors for 3D object detection in au...

Fusing Bird View LIDAR Point Cloud and Front View Camera Image for Deep Object Detection

We propose a new method for fusing a LIDAR point cloud and camera-captur...

SemanticVoxels: Sequential Fusion for 3D Pedestrian Detection using LiDAR Point Cloud and Semantic Segmentation

3D pedestrian detection is a challenging task in automated driving becau...

BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation

Multi-sensor fusion is essential for an accurate and reliable autonomous...

3D Dual-Fusion: Dual-Domain Dual-Query Camera-LiDAR Fusion for 3D Object Detection

Fusing data from cameras and LiDAR sensors is an essential technique to ...

MVX-Net: Multimodal VoxelNet for 3D Object Detection

Many recent works on 3D object detection have focused on designing neura...

Sparse Dense Fusion for 3D Object Detection

With the prevalence of multimodal learning, camera-LiDAR fusion has gain...

Please sign up or login with your details

Forgot password? Click here to reset