Kick Back Relax: Learning to Reconstruct the World by Watching SlowTV

07/20/2023
by   Jaime Spencer, et al.
0

Self-supervised monocular depth estimation (SS-MDE) has the potential to scale to vast quantities of data. Unfortunately, existing approaches limit themselves to the automotive domain, resulting in models incapable of generalizing to complex environments such as natural or indoor settings. To address this, we propose a large-scale SlowTV dataset curated from YouTube, containing an order of magnitude more data than existing automotive datasets. SlowTV contains 1.7M images from a rich diversity of environments, such as worldwide seasonal hiking, scenic driving and scuba diving. Using this dataset, we train an SS-MDE model that provides zero-shot generalization to a large collection of indoor/outdoor datasets. The resulting model outperforms all existing SSL approaches and closes the gap on supervised SoTA, despite using a more efficient architecture. We additionally introduce a collection of best-practices to further maximize performance and zero-shot generalization. This includes 1) aspect ratio augmentation, 2) camera intrinsic estimation, 3) support frame randomization and 4) flexible motion estimation. Code is available at https://github.com/jspenmar/slowtv_monodepth.

READ FULL TEXT

page 1

page 3

page 4

page 7

page 9

page 10

page 11

page 12

research
07/20/2023

Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image

Reconstructing accurate 3D scenes from images is a long-standing vision ...
research
02/23/2023

ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth

This paper tackles the problem of depth estimation from a single image. ...
research
07/03/2022

Can Language Understand Depth?

Besides image classification, Contrastive Language-Image Pre-training (C...
research
06/29/2023

Towards Zero-Shot Scale-Aware Monocular Depth Estimation

Monocular depth estimation is scale-ambiguous, and thus requires scale s...
research
08/19/2021

StructDepth: Leveraging the structural regularities for self-supervised indoor depth estimation

Self-supervised monocular depth estimation has achieved impressive perfo...
research
08/06/2020

Zero-Shot Multi-View Indoor Localization via Graph Location Networks

Indoor localization is a fundamental problem in location-based applicati...
research
12/30/2021

THE Benchmark: Transferable Representation Learning for Monocular Height Estimation

Generating 3D city models rapidly is crucial for many applications. Mono...

Please sign up or login with your details

Forgot password? Click here to reset