MetaBEV: Solving Sensor Failures for BEV Detection and Map Segmentation

by   Chongjian Ge, et al.

Perception systems in modern autonomous driving vehicles typically take inputs from complementary multi-modal sensors, e.g., LiDAR and cameras. However, in real-world applications, sensor corruptions and failures lead to inferior performances, thus compromising autonomous safety. In this paper, we propose a robust framework, called MetaBEV, to address extreme real-world environments involving overall six sensor corruptions and two extreme sensor-missing situations. In MetaBEV, signals from multiple sensors are first processed by modal-specific encoders. Subsequently, a set of dense BEV queries are initialized, termed meta-BEV. These queries are then processed iteratively by a BEV-Evolving decoder, which selectively aggregates deep features from either LiDAR, cameras, or both modalities. The updated BEV representations are further leveraged for multiple 3D prediction tasks. Additionally, we introduce a new M2oE structure to alleviate the performance drop on distinct tasks in multi-task joint learning. Finally, MetaBEV is evaluated on the nuScenes dataset with 3D object detection and BEV map segmentation tasks. Experiments show MetaBEV outperforms prior arts by a large margin on both full and corrupted modalities. For instance, when the LiDAR signal is missing, MetaBEV improves 35.5 BEVFusion model; and when the camera signal is absent, MetaBEV still achieves 69.2 on full-modalities. Moreover, MetaBEV performs fairly against previous methods in both canonical perception and multi-task learning settings, refreshing state-of-the-art nuScenes BEV map segmentation with 70.4


page 1

page 4

page 12


Towards a Robust Sensor Fusion Step for 3D Object Detection on Corrupted Data

Multimodal sensor fusion methods for 3D object detection have been revol...

UniM^2AE: Multi-modal Masked Autoencoders with Unified 3D Representation for 3D Perception in Autonomous Driving

Masked Autoencoders (MAE) play a pivotal role in learning potent represe...

HRFuser: A Multi-resolution Sensor Fusion Architecture for 2D Object Detection

Besides standard cameras, autonomous vehicles typically include multiple...

Instance Segmentation with Cross-Modal Consistency

Segmenting object instances is a key task in machine perception, with sa...

Multi-Modal Multi-Task (3MT) Road Segmentation

Multi-modal systems have the capacity of producing more reliable results...

Investigating the Effect of Sensor Modalities in Multi-Sensor Detection-Prediction Models

Detection of surrounding objects and their motion prediction are critica...

Multi-modal Streaming 3D Object Detection

Modern autonomous vehicles rely heavily on mechanical LiDARs for percept...

Please sign up or login with your details

Forgot password? Click here to reset