Transformers Solve the Limited Receptive Field for Monocular Depth Prediction

03/22/2021
by   Guanglei Yang, et al.
13

While convolutional neural networks have shown a tremendous impact on various computer vision tasks, they generally demonstrate limitations in explicitly modeling long-range dependencies due to the intrinsic locality of the convolution operation. Transformers, initially designed for natural language processing tasks, have emerged as alternative architectures with innate global self-attention mechanisms to capture long-range dependencies. In this paper, we propose TransDepth, an architecture which benefits from both convolutional neural networks and transformers. To avoid the network to loose its ability to capture local-level details due to the adoption of transformers, we propose a novel decoder which employs on attention mechanisms based on gates. Notably, this is the first paper which applies transformers into pixel-wise prediction problems involving continuous labels (i.e., monocular depth prediction and surface normal estimation). Extensive experiments demonstrate that the proposed TransDepth achieves state-of-the-art performance on three challenging datasets. The source code and trained models are available at https://github.com/ygjwd12345/TransDepth.

READ FULL TEXT

page 3

page 4

page 6

page 7

page 8

research
06/04/2021

Glance-and-Gaze Vision Transformer

Recently, there emerges a series of vision Transformers, which show supe...
research
11/14/2022

ParCNetV2: Oversized Kernel with Enhanced Attention

Transformers have achieved tremendous success in various computer vision...
research
10/07/2021

TranSalNet: Towards perceptually relevant visual saliency prediction

Convolutional neural networks (CNNs) have significantly advanced computa...
research
11/23/2022

GhostNetV2: Enhance Cheap Operation with Long-Range Attention

Light-weight convolutional neural networks (CNNs) are specially designed...
research
04/12/2021

LocalViT: Bringing Locality to Vision Transformers

We study how to introduce locality mechanisms into vision transformers. ...
research
03/31/2022

Deep Hyperspectral Unmixing using Transformer Network

Currently, this paper is under review in IEEE. Transformers have intrigu...
research
10/22/2022

Accumulated Trivial Attention Matters in Vision Transformers on Small Datasets

Vision Transformers has demonstrated competitive performance on computer...

Please sign up or login with your details

Forgot password? Click here to reset