Safety-Enhanced Autonomous Driving Using Interpretable Sensor Fusion Transformer

by   Hao Shao, et al.

Large-scale deployment of autonomous vehicles has been continually delayed due to safety concerns. On the one hand, comprehensive scene understanding is indispensable, a lack of which would result in vulnerability to rare but complex traffic situations, such as the sudden emergence of unknown objects. However, reasoning from a global context requires access to sensors of multiple types and adequate fusion of multi-modal sensor signals, which is difficult to achieve. On the other hand, the lack of interpretability in learning models also hampers the safety with unverifiable failure causes. In this paper, we propose a safety-enhanced autonomous driving framework, named Interpretable Sensor Fusion Transformer(InterFuser), to fully process and fuse information from multi-modal multi-view sensors for achieving comprehensive scene understanding and adversarial event detection. Besides, intermediate interpretable features are generated from our framework, which provide more semantics and are exploited to better constrain actions to be within the safe sets. We conducted extensive experiments on CARLA benchmarks, where our model outperforms prior methods, ranking the first on the public CARLA Leaderboard. Our code will be made available at


page 3

page 8

page 15


Multi-Modal Fusion Transformer for End-to-End Autonomous Driving

How should representations from complementary sensors be integrated for ...

UniM^2AE: Multi-modal Masked Autoencoders with Unified 3D Representation for 3D Perception in Autonomous Driving

Masked Autoencoders (MAE) play a pivotal role in learning potent represe...

Detecting Safety Problems of Multi-Sensor Fusion in Autonomous Driving

Autonomous driving (AD) systems have been thriving in recent years. In g...

Multi-Modal 3D Object Detection by Box Matching

Multi-modal 3D object detection has received growing attention as the in...

Multi-view and Multi-modal Event Detection Utilizing Transformer-based Multi-sensor fusion

We tackle a challenging task: multi-view and multi-modal event detection...

A Persistent and Context-aware Behavior Tree Framework for Multi Sensor Localization in Autonomous Driving

Robust and persistent localisation is essential for ensuring the safe op...

Please sign up or login with your details

Forgot password? Click here to reset