3D-RETR: End-to-End Single and Multi-View 3D Reconstruction with Transformers

10/17/2021
by   Zai Shi, et al.
0

3D reconstruction aims to reconstruct 3D objects from 2D views. Previous works for 3D reconstruction mainly focus on feature matching between views or using CNNs as backbones. Recently, Transformers have been shown effective in multiple applications of computer vision. However, whether or not Transformers can be used for 3D reconstruction is still unclear. In this paper, we fill this gap by proposing 3D-RETR, which is able to perform end-to-end 3D REconstruction with TRansformers. 3D-RETR first uses a pretrained Transformer to extract visual features from 2D input images. 3D-RETR then uses another Transformer Decoder to obtain the voxel features. A CNN Decoder then takes as input the voxel features to obtain the reconstructed objects. 3D-RETR is capable of 3D reconstruction from a single view or multiple views. Experimental results on two datasets show that 3DRETR reaches state-of-the-art performance on 3D reconstruction. Additional ablation study also demonstrates that 3D-DETR benefits from using Transformers.

READ FULL TEXT

page 1

page 8

page 9

page 16

page 17

research
12/01/2021

VoRTX: Volumetric 3D Reconstruction With Transformers for Voxelwise View Selection and Fusion

Recent volumetric 3D reconstruction methods can produce very accurate re...
research
12/29/2022

Local Learning on Transformers via Feature Reconstruction

Transformers are becoming increasingly popular due to their superior per...
research
07/18/2023

NU-MCC: Multiview Compressive Coding with Neighborhood Decoder and Repulsive UDF

Remarkable progress has been made in 3D reconstruction from single-view ...
research
03/28/2023

SnakeVoxFormer: Transformer-based Single Image Voxel Reconstruction with Run Length Encoding

Deep learning-based 3D object reconstruction has achieved unprecedented ...
research
09/13/2022

Multiple View Performers for Shape Completion

We propose the Multiple View Performer (MVP) - a new architecture for 3D...
research
10/25/2022

DialogConv: A Lightweight Fully Convolutional Network for Multi-view Response Selection

Current end-to-end retrieval-based dialogue systems are mainly based on ...
research
09/05/2023

TiAVox: Time-aware Attenuation Voxels for Sparse-view 4D DSA Reconstruction

Four-dimensional Digital Subtraction Angiography (4D DSA) plays a critic...

Please sign up or login with your details

Forgot password? Click here to reset