BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers

03/31/2022
by   Zhiqi Li, et al.
0

3D visual perception tasks, including 3D detection and map segmentation based on multi-camera images, are essential for autonomous driving systems. In this work, we present a new framework termed BEVFormer, which learns unified BEV representations with spatiotemporal transformers to support multiple autonomous driving perception tasks. In a nutshell, BEVFormer exploits both spatial and temporal information by interacting with spatial and temporal space through predefined grid-shaped BEV queries. To aggregate spatial information, we design a spatial cross-attention that each BEV query extracts the spatial features from the regions of interest across camera views. For temporal information, we propose a temporal self-attention to recurrently fuse the history BEV information. Our approach achieves the new state-of-the-art 56.9% in terms of NDS metric on the nuScenes test set, which is 9.0 points higher than previous best arts and on par with the performance of LiDAR-based baselines. We further show that BEVFormer remarkably improves the accuracy of velocity estimation and recall of objects under low visibility conditions. The code will be released at https://github.com/zhiqi-li/BEVFormer.

READ FULL TEXT

page 11

page 19

page 20

research
05/19/2022

BEVerse: Unified Perception and Prediction in Birds-Eye-View for Vision-Centric Autonomous Driving

In this paper, we present BEVerse, a unified framework for 3D perception...
research
03/15/2020

MotionNet: Joint Perception and Motion Prediction for Autonomous Driving Based on Bird's Eye View Maps

The ability to reliably perceive the environmental states, particularly ...
research
06/16/2023

PanoOcc: Unified Occupancy Representation for Camera-based 3D Panoptic Segmentation

Comprehensive modeling of the surrounding 3D world is key to the success...
research
07/15/2022

ST-P3: End-to-end Vision-based Autonomous Driving via Spatial-Temporal Feature Learning

Many existing autonomous driving paradigms involve a multi-stage discret...
research
03/17/2023

TBP-Former: Learning Temporal Bird's-Eye-View Pyramid for Joint Perception and Prediction in Vision-Centric Autonomous Driving

Vision-centric joint perception and prediction (PnP) has become an emerg...
research
01/19/2023

Fast-BEV: Towards Real-time On-vehicle Bird's-Eye View Perception

Recently, the pure camera-based Bird's-Eye-View (BEV) perception removes...
research
10/25/2021

Where were my keys? – Aggregating Spatial-Temporal Instances of Objects for Efficient Retrieval over Long Periods of Time

Robots equipped with situational awareness can help humans efficiently f...

Please sign up or login with your details

Forgot password? Click here to reset