Virtual Sparse Convolution for Multimodal 3D Object Detection

by   Hai Wu, et al.

Recently, virtual/pseudo-point-based 3D object detection that seamlessly fuses RGB images and LiDAR data by depth completion has gained great attention. However, virtual points generated from an image are very dense, introducing a huge amount of redundant computation during detection. Meanwhile, noises brought by inaccurate depth completion significantly degrade detection precision. This paper proposes a fast yet effective backbone, termed VirConvNet, based on a new operator VirConv (Virtual Sparse Convolution), for virtual-point-based 3D object detection. VirConv consists of two key designs: (1) StVD (Stochastic Voxel Discard) and (2) NRConv (Noise-Resistant Submanifold Convolution). StVD alleviates the computation problem by discarding large amounts of nearby redundant voxels. NRConv tackles the noise problem by encoding voxel features in both 2D image and 3D LiDAR space. By integrating VirConv, we first develop an efficient pipeline VirConv-L based on an early fusion design. Then, we build a high-precision pipeline VirConv-T based on a transformed refinement scheme. Finally, we develop a semi-supervised pipeline VirConv-S based on a pseudo-label framework. On the KITTI car 3D detection test leaderboard, our VirConv-L achieves 85 Our VirConv-T and VirConv-S attains a high-precision of 86.3 currently rank 2nd and 1st, respectively. The code is available at


page 2

page 3


End-to-End Pseudo-LiDAR for Image-Based 3D Object Detection

Reliable and accurate 3D object detection is a necessity for safe autono...

Point Density-Aware Voxels for LiDAR 3D Object Detection

LiDAR has become one of the primary 3D object detection sensors in auton...

RTS3D: Real-time Stereo 3D Detection from 4D Feature-Consistency Embedding Space for Autonomous Driving

Although the recent image-based 3D object detection methods using Pseudo...

FSD V2: Improving Fully Sparse 3D Object Detection with Virtual Voxels

LiDAR-based fully sparse architecture has garnered increasing attention....

Accurate and Real-time Pseudo Lidar Detection: Is Stereo Neural Network Really Necessary?

The proposal of Pseudo-Lidar representation has significantly narrowed t...

VPFNet: Voxel-Pixel Fusion Network for Multi-class 3D Object Detection

Many LiDAR-based methods for detecting large objects, single-class objec...

VPFNet: Improving 3D Object Detection with Virtual Point based LiDAR and Stereo Data Fusion

It has been well recognized that fusing the complementary information fr...

Please sign up or login with your details

Forgot password? Click here to reset