A Light Touch Approach to Teaching Transformers Multi-view Geometry

11/28/2022
by   Yash Bhalgat, et al.
1

Transformers are powerful visual learners, in large part due to their conspicuous lack of manually-specified priors. This flexibility can be problematic in tasks that involve multiple-view geometry, due to the near-infinite possible variations in 3D shapes and viewpoints (requiring flexibility), and the precise nature of projective geometry (obeying rigid laws). To resolve this conundrum, we propose a "light touch" approach, guiding visual Transformers to learn multiple-view geometry but allowing them to break free when needed. We achieve this by using epipolar lines to guide the Transformer's cross-attention maps, penalizing attention values outside the epipolar lines and encouraging higher attention along these lines since they contain geometrically plausible matches. Unlike previous methods, our proposal does not require any camera pose information at test-time. We focus on pose-invariant object instance retrieval, where standard Transformer networks struggle, due to the large differences in viewpoint between query and retrieved images. Experimentally, our method outperforms state-of-the-art approaches at object retrieval, without needing pose information at test-time.

READ FULL TEXT

page 6

page 14

page 15

page 16

page 17

page 22

page 24

page 25

research
05/28/2022

WT-MVSNet: Window-based Transformers for Multi-view Stereo

Recently, Transformers were shown to enhance the performance of multi-vi...
research
04/22/2023

Self-supervised Learning by View Synthesis

We present view-synthesis autoencoders (VSA) in this paper, which is a s...
research
11/07/2021

Direct Multi-view Multi-person 3D Pose Estimation

We present Multi-view Pose transformer (MvP) for estimating multi-person...
research
05/10/2021

Visual Grounding with Transformers

In this paper, we propose a transformer based approach for visual ground...
research
05/28/2019

Cerberus: A Multi-headed Derenderer

To generalize to novel visual scenes with new viewpoints and new object ...
research
01/11/2023

Geometry-biased Transformers for Novel View Synthesis

We tackle the task of synthesizing novel views of an object given a few ...

Please sign up or login with your details

Forgot password? Click here to reset