A Study on the Generality of Neural Network Structures for Monocular Depth Estimation

by   Jinwoo Bae, et al.

Monocular depth estimation has been widely studied, and significant improvements in performance have been recently reported. However, most previous works are evaluated on a few benchmark datasets, such as KITTI datasets, and none of the works provide an in-depth analysis of the generalization performance of monocular depth estimation. In this paper, we deeply investigate the various backbone networks (e.g.CNN and Transformer models) toward the generalization of monocular depth estimation. First, we evaluate state-of-the-art models on both in-distribution and out-of-distribution datasets, which have never been seen during network training. Then, we investigate the internal properties of the representations from the intermediate layers of CNN-/Transformer-based models using synthetic texture-shifted datasets. Through extensive experiments, we observe that the Transformers exhibit a strong shape-bias rather than CNNs, which have a strong texture-bias. We also discover that texture-biased models exhibit worse generalization performance for monocular depth estimation than shape-biased models. We demonstrate that similar aspects are observed in real-world driving datasets captured under diverse environments. Lastly, we conduct a dense ablation study with various backbone networks which are utilized in modern strategies. The experiments demonstrate that the intrinsic locality of the CNNs and the self-attention of the Transformers induce texture-bias and shape-bias, respectively.


page 3

page 5

page 6

page 7

page 9

page 10

page 11

page 14


MonoFormer: Towards Generalization of self-supervised monocular depth estimation with Transformers

Self-supervised monocular depth estimation has been widely studied recen...

Depth-Relative Self Attention for Monocular Depth Estimation

Monocular depth estimation is very challenging because clues to the exac...

How much depth information can radar infer and contribute

Since the release of radar data in large scale autonomous driving datase...

Improving Depth Gradient Continuity in Transformers: A Comparative Study on Monocular Depth Estimation with CNN

Monocular depth estimation is an ongoing challenge in computer vision. R...

Adversarial Patch Attacks on Monocular Depth Estimation Networks

Thanks to the excellent learning capability of deep convolutional neural...

EGformer: Equirectangular Geometry-biased Transformer for 360 Depth Estimation

Estimating the depths of equirectangular (360) images (EIs) is challengi...

Shape or Texture: Understanding Discriminative Features in CNNs

Contrasting the previous evidence that neurons in the later layers of a ...

Please sign up or login with your details

Forgot password? Click here to reset