Learning from Multi-View Representation for Point-Cloud Pre-Training

06/05/2023
by   Siming Yan, et al.
0

A critical problem in the pre-training of 3D point clouds is leveraging massive 2D data. A fundamental challenge is to address the 2D-3D domain gap. This paper proposes a novel approach to point-cloud pre-training that enables learning 3D representations by leveraging pre-trained 2D-based networks. In particular, it avoids overfitting to 2D representations and potentially discarding critical 3D features for 3D recognition tasks. The key to our approach is a novel multi-view representation, which learns a shared 3D feature volume consistent with deep features extracted from multiple 2D camera views. The 2D deep features are regularized using pre-trained 2D networks through the 2D knowledge transfer loss. To prevent the resulting 3D feature representations from discarding 3D signals, we introduce the multi-view consistency loss that forces the projected 2D feature representations to capture pixel-wise correspondences across different views. Such correspondences induce 3D geometry and effectively retain 3D features in the projected 2D features. Experimental results demonstrate that our pre-trained model can be successfully transferred to various downstream tasks, including 3D detection and semantic segmentation, and achieve state-of-the-art performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/28/2022

Self-Supervised Learning with Multi-View Rendering for 3D Point Cloud Analysis

Recently, great progress has been made in 3D deep learning with the emer...
research
04/20/2023

Multi-view Vision-Prompt Fusion Network: Can 2D Pre-trained Model Boost 3D Point Cloud Data-scarce Learning?

Point cloud based 3D deep model has wide applications in many applicatio...
research
10/02/2020

Pre-Training by Completing Point Clouds

There has recently been a flurry of exciting advances in deep learning m...
research
08/04/2022

MVSFormer: Multi-View Stereo with Pre-trained Vision Transformers and Temperature-based Depth

Feature representation learning is the key recipe for learning-based Mul...
research
08/17/2023

ImGeoNet: Image-induced Geometry-aware Voxel Representation for Multi-view 3D Object Detection

We propose ImGeoNet, a multi-view image-based 3D object detection framew...
research
03/11/2023

FAC: 3D Representation Learning via Foreground Aware Feature Contrast

Contrastive learning has recently demonstrated great potential for unsup...

Please sign up or login with your details

Forgot password? Click here to reset