Scaling Self-Supervised End-to-End Driving with Multi-View Attention Learning

by   Yi Xiao, et al.

On end-to-end driving, a large amount of expert driving demonstrations is used to train an agent that mimics the expert by predicting its control actions. This process is self-supervised on vehicle signals (e.g., steering angle, acceleration) and does not require extra costly supervision (human labeling). Yet, the improvement of existing self-supervised end-to-end driving models has mostly given room to modular end-to-end models where labeling data intensive format such as semantic segmentation are required during training time. However, we argue that the latest self-supervised end-to-end models were developed in sub-optimal conditions with low-resolution images and no attention mechanisms. Further, those models are confined with limited field of view and far from the human visual cognition which can quickly attend far-apart scene features, a trait that provides an useful inductive bias. In this context, we present a new end-to-end model, trained by self-supervised imitation learning, leveraging a large field of view and a self-attention mechanism. These settings are more contributing to the agent's understanding of the driving scene, which brings a better imitation of human drivers. With only self-supervised training data, our model yields almost expert performance in CARLA's Nocrash metrics and could be rival to the SOTA models requiring large amounts of human labeled data. To facilitate further research, our code will be released.


page 4

page 6

page 7

page 22

page 23


End-to-End Driving via Self-Supervised Imitation Learning Using Camera and LiDAR Data

In autonomous driving, the end-to-end (E2E) driving approach that predic...

End-to-End Urban Driving by Imitating a Reinforcement Learning Coach

End-to-end approaches to autonomous driving commonly rely on expert demo...

ES-MVSNet: Efficient Framework for End-to-end Self-supervised Multi-View Stereo

Compared to the multi-stage self-supervised multi-view stereo (MVS) meth...

Grounding Human-to-Vehicle Advice for Self-driving Vehicles

Recent success suggests that deep neural control networks are likely to ...

Attend and Segment: Attention Guided Active Semantic Segmentation

In a dynamic environment, an agent with a limited field of view/resource...

Self-Supervised Simultaneous Multi-Step Prediction of Road Dynamics and Cost Map

While supervised learning is widely used for perception modules in conve...

Learning Accurate and Human-Like Driving using Semantic Maps and Attention

This paper investigates how end-to-end driving models can be improved to...

Please sign up or login with your details

Forgot password? Click here to reset