ImGeoNet: Image-induced Geometry-aware Voxel Representation for Multi-view 3D Object Detection

by   Tao Tu, et al.

We propose ImGeoNet, a multi-view image-based 3D object detection framework that models a 3D space by an image-induced geometry-aware voxel representation. Unlike previous methods which aggregate 2D features into 3D voxels without considering geometry, ImGeoNet learns to induce geometry from multi-view images to alleviate the confusion arising from voxels of free space, and during the inference phase, only images from multiple views are required. Besides, a powerful pre-trained 2D feature extractor can be leveraged by our representation, leading to a more robust performance. To evaluate the effectiveness of ImGeoNet, we conduct quantitative and qualitative experiments on three indoor datasets, namely ARKitScenes, ScanNetV2, and ScanNet200. The results demonstrate that ImGeoNet outperforms the current state-of-the-art multi-view image-based method, ImVoxelNet, on all three datasets in terms of detection accuracy. In addition, ImGeoNet shows great data efficiency by achieving results comparable to ImVoxelNet with 100 views while utilizing only 40 views. Furthermore, our studies indicate that our proposed image-induced geometry-aware representation can enable image-based methods to attain superior detection accuracy than the seminal point cloud-based method, VoteNet, in two practical scenarios: (1) scenarios where point clouds are sparse and noisy, such as in ARKitScenes, and (2) scenarios involve diverse object classes, particularly classes of small objects, as in the case in ScanNet200.


page 1

page 9

page 13

page 14

page 15

page 16


Multi-View 3D Object Detection Network for Autonomous Driving

This paper aims at high-accuracy 3D object detection in autonomous drivi...

MLOD: A multi-view 3D object detection based on robust feature fusion method

This paper presents Multi-view Labelling Object Detector (MLOD). The det...

CVFNet: Real-time 3D Object Detection by Learning Cross View Features

In recent years 3D object detection from LiDAR point clouds has made gre...

From Multi-View to Hollow-3D: Hallucinated Hollow-3D R-CNN for 3D Object Detection

As an emerging data modal with precise distance sensing, LiDAR point clo...

Surface Light Field Compression using a Point Cloud Codec

Light field (LF) representations aim to provide photo-realistic, free-vi...

Learning from Multi-View Representation for Point-Cloud Pre-Training

A critical problem in the pre-training of 3D point clouds is leveraging ...

Learning Photometric Feature Transform for Free-form Object Scan

We propose a novel framework to automatically learn to aggregate and tra...

Please sign up or login with your details

Forgot password? Click here to reset