UMIFormer: Mining the Correlations between Similar Tokens for Multi-View 3D Reconstruction

02/27/2023
by   Zhenwei Zhu, et al.
0

In recent years, many video tasks have achieved breakthroughs by utilizing the vision transformer and establishing spatial-temporal decoupling for feature extraction. Although multi-view 3D reconstruction also faces multiple images as input, it cannot immediately inherit their success due to completely ambiguous associations between unordered views. There is not usable prior relationship, which is similar to the temporally-coherence property in a video. To solve this problem, we propose a novel transformer network for Unordered Multiple Images (UMIFormer). It exploits transformer blocks for decoupled intra-view encoding and designed blocks for token rectification that mine the correlation between similar tokens from different views to achieve decoupled inter-view encoding. Afterward, all tokens acquired from various branches are compressed into a fixed-size compact representation while preserving rich information for reconstruction by leveraging the similarities between tokens. We empirically demonstrate on ShapeNet and confirm that our decoupled learning method is adaptable for unordered multiple images. Meanwhile, the experiments also verify our model outperforms existing SOTA methods by a large margin.

READ FULL TEXT

page 1

page 3

page 8

research
08/17/2023

Long-Range Grouping Transformer for Multi-View 3D Reconstruction

Nowadays, transformer networks have demonstrated superior performance in...
research
10/04/2022

Multi-view Human Body Mesh Translator

Existing methods for human mesh recovery mainly focus on single-view fra...
research
03/24/2021

Multi-view 3D Reconstruction with Transformer

Deep CNN-based methods have so far achieved the state of the art results...
research
07/25/2018

Multi-view Reconstructive Preserving Embedding for Dimension Reduction

With the development of feature extraction technique, one sample always ...
research
12/12/2022

CTT-Net: A Multi-view Cross-token Transformer for Cataract Postoperative Visual Acuity Prediction

Surgery is the only viable treatment for cataract patients with visual a...
research
03/29/2021

ViViT: A Video Vision Transformer

We present pure-transformer based models for video classification, drawi...
research
04/28/2023

Making the Invisible Visible: Toward High-Quality Terahertz Tomographic Imaging via Physics-Guided Restoration

Terahertz (THz) tomographic imaging has recently attracted significant a...

Please sign up or login with your details

Forgot password? Click here to reset