Improving Self-Supervised Single View Depth Estimation by Masking Occlusion

by   Maarten Schellevis, et al.

Single view depth estimation models can be trained from video footage using a self-supervised end-to-end approach with view synthesis as the supervisory signal. This is achieved with a framework that predicts depth and camera motion, with a loss based on reconstructing a target video frame from temporally adjacent frames. In this context, occlusion relates to parts of a scene that can be observed in the target frame but not in a frame used for image reconstruction. Since the image reconstruction is based on sampling from the adjacent frame, and occluded areas by definition cannot be sampled, reconstructed occluded areas corrupt to the supervisory signal. In previous work arXiv:1806.01260 occlusion is handled based on reconstruction error; at each pixel location, only the reconstruction with the lowest error is included in the loss. The current study aims to determine whether performance improvements of depth estimation models can be gained by during training only ignoring those regions that are affected by occlusion. In this work we introduce occlusion mask, a mask that during training can be used to specifically ignore regions that cannot be reconstructed due to occlusions. Occlusion mask is based entirely on predicted depth information. We introduce two novel loss formulations which incorporate the occlusion mask. The method and implementation of arXiv:1806.01260 serves as the foundation for our modifications as well as the baseline in our experiments. We demonstrate that (i) incorporating occlusion mask in the loss function improves the performance of single image depth prediction models on the KITTI benchmark. (ii) loss functions that select from reconstructions based on error are able to ignore some of the reprojection error caused by object motion.


page 2

page 3

page 4

page 6

page 7

page 9

page 10


Self-Supervised Attention Learning for Depth and Ego-motion Estimation

We address the problem of depth and ego-motion estimation from image seq...

DS-Depth: Dynamic and Static Depth Estimation via a Fusion Cost Volume

Self-supervised monocular depth estimation methods typically rely on the...

Exploring the Mutual Influence between Self-Supervised Single-Frame and Multi-Frame Depth Estimation

Although both self-supervised single-frame and multi-frame depth estimat...

SUB-Depth: Self-distillation and Uncertainty Boosting Self-supervised Monocular Depth Estimation

We propose SUB-Depth, a universal multi-task training framework for self...

Dyna-DepthFormer: Multi-frame Transformer for Self-Supervised Depth Estimation in Dynamic Scenes

Self-supervised methods have showed promising results on depth estimatio...

Smart Dimming Sunglasses for Photophobia Using Spatial Light Modulator

We propose a smart dimming sunglasses system for individuals with photop...

People as Scene Probes

By analyzing the motion of people and other objects in a scene, we demon...

Please sign up or login with your details

Forgot password? Click here to reset