Boosting Monocular 3D Object Detection with Object-Centric Auxiliary Depth Supervision

10/29/2022
by   Youngseok Kim, et al.
0

Recent advances in monocular 3D detection leverage a depth estimation network explicitly as an intermediate stage of the 3D detection network. Depth map approaches yield more accurate depth to objects than other methods thanks to the depth estimation network trained on a large-scale dataset. However, depth map approaches can be limited by the accuracy of the depth map, and sequentially using two separated networks for depth estimation and 3D detection significantly increases computation cost and inference time. In this work, we propose a method to boost the RGB image-based 3D detector by jointly training the detection network with a depth prediction loss analogous to the depth estimation task. In this way, our 3D detection network can be supervised by more depth supervision from raw LiDAR points, which does not require any human annotation cost, to estimate accurate depth without explicitly predicting the depth map. Our novel object-centric depth prediction loss focuses on depth around foreground objects, which is important for 3D object detection, to leverage pixel-wise depth supervision in an object-centric manner. Our depth regression model is further trained to predict the uncertainty of depth to represent the 3D confidence of objects. To effectively train the 3D detector with raw LiDAR points and to enable end-to-end training, we revisit the regression target of 3D objects and design a network architecture. Extensive experiments on KITTI and nuScenes benchmarks show that our method can significantly boost the monocular image-based 3D detector to outperform depth map approaches while maintaining the real-time inference speed.

READ FULL TEXT

page 1

page 2

page 5

page 6

page 9

page 10

page 13

research
08/13/2021

Is Pseudo-Lidar needed for Monocular 3D Object detection?

Recent progress in 3D object detection from single images leverages mono...
research
04/13/2021

VR3Dense: Voxel Representation Learning for 3D Object Detection and Monocular Dense Depth Reconstruction

3D object detection and dense depth estimation are one of the most vital...
research
12/06/2022

Objects as Spatio-Temporal 2.5D points

Determining accurate bird's eye view (BEV) positions of objects and trac...
research
07/28/2021

Aug3D-RPN: Improving Monocular 3D Object Detection by Synthetic Images with Virtual Depth

Current geometry-based monocular 3D object detection models can efficien...
research
10/05/2022

Depth Is All You Need for Monocular 3D Detection

A key contributor to recent progress in 3D detection from single images ...
research
07/07/2020

LabelEnc: A New Intermediate Supervision Method for Object Detection

In this paper we propose a new intermediate supervision method, named La...
research
03/23/2023

MoGDE: Boosting Mobile Monocular 3D Object Detection with Ground Depth Estimation

Monocular 3D object detection (Mono3D) in mobile settings (e.g., on a ve...

Please sign up or login with your details

Forgot password? Click here to reset