Sim2Real for Self-Supervised Monocular Depth and Segmentation

by   Nithin Raghavan, et al.

Image-based learning methods for autonomous vehicle perception tasks require large quantities of labelled, real data in order to properly train without overfitting, which can often be incredibly costly. While leveraging the power of simulated data can potentially aid in mitigating these costs, networks trained in the simulation domain usually fail to perform adequately when applied to images in the real domain. Recent advances in domain adaptation have indicated that a shared latent space assumption can help to bridge the gap between the simulation and real domains, allowing the transference of the predictive capabilities of a network from the simulation domain to the real domain. We demonstrate that a twin VAE-based architecture with a shared latent space and auxiliary decoders is able to bridge the sim2real gap without requiring any paired, ground-truth data in the real domain. Using only paired, ground-truth data in the simulation domain, this architecture has the potential to generate perception tasks such as depth and segmentation maps. We compare this method to networks trained in a supervised manner to indicate the merit of these results.


page 1

page 4

page 6


Exploring the Capabilities and Limits of 3D Monocular Object Detection – A Study on Simulation and Real World Data

3D object detection based on monocular camera data is a key enabler for ...

ActiveZero: Mixed Domain Learning for Active Stereovision with Zero Annotation

Traditional depth sensors generate accurate real world depth estimates t...

Towards Optimal Strategies for Training Self-Driving Perception Models in Simulation

Autonomous driving relies on a huge volume of real-world data to be labe...

Plausibility-Based Heuristics for Latent Space Classical Planning

Recent work on LatPlan has shown that it is possible to learn models for...

Cycle Self-Training for Domain Adaptation

Mainstream approaches for unsupervised domain adaptation (UDA) learn dom...

Learning Feature Decomposition for Domain Adaptive Monocular Depth Estimation

Monocular depth estimation (MDE) has attracted intense study due to its ...

A performance contextualization approach to validating camera models for robot simulation

The focus of this contribution is on camera simulation as it comes into ...

Please sign up or login with your details

Forgot password? Click here to reset