OCBEV: Object-Centric BEV Transformer for Multi-View 3D Object Detection

by   Zhangyang Qi, et al.

Multi-view 3D object detection is becoming popular in autonomous driving due to its high effectiveness and low cost. Most of the current state-of-the-art detectors follow the query-based bird's-eye-view (BEV) paradigm, which benefits from both BEV's strong perception power and end-to-end pipeline. Despite achieving substantial progress, existing works model objects via globally leveraging temporal and spatial information of BEV features, resulting in problems when handling the challenging complex and dynamic autonomous driving scenarios. In this paper, we proposed an Object-Centric query-BEV detector OCBEV, which can carve the temporal and spatial cues of moving targets more effectively. OCBEV comprises three designs: Object Aligned Temporal Fusion aligns the BEV feature based on ego-motion and estimated current locations of moving objects, leading to a precise instance-level feature fusion. Object Focused Multi-View Sampling samples more 3D features from an adaptive local height ranges of objects for each scene to enrich foreground information. Object Informed Query Enhancement replaces part of pre-defined decoder queries in common DETR-style decoders with positional features of objects on high-confidence locations, introducing more direct object positional priors. Extensive experimental evaluations are conducted on the challenging nuScenes dataset. Our approach achieves a state-of-the-art result, surpassing the traditional BEVFormer by 1.5 NDS points. Moreover, we have a faster convergence speed and only need half of the training iterations to get comparable performance, which further demonstrates its effectiveness.


page 8

page 14

page 15


DETR4D: Direct Multi-View 3D Object Detection with Sparse Attention

3D object detection with surround-view images is an essential task for a...

Multi-View Adaptive Fusion Network for 3D Object Detection

3D object detection based on LiDAR-camera fusion is becoming an emerging...

BEVDet: High-performance Multi-camera 3D Object Detection in Bird-Eye-View

Autonomous driving perceives the surrounding environment for decision ma...

Graph-DETR3D: Rethinking Overlapping Regions for Multi-View 3D Object Detection

3D object detection from multiple image views is a fundamental and chall...

Temporal Enhanced Training of Multi-view 3D Object Detector via Historical Object Prediction

In this paper, we propose a new paradigm, named Historical Object Predic...

X-view: Non-egocentric Multi-View 3D Object Detector

3D object detection algorithms for autonomous driving reason about 3D ob...

3D Video Object Detection with Learnable Object-Centric Global Optimization

We explore long-term temporal visual correspondence-based optimization f...

Please sign up or login with your details

Forgot password? Click here to reset