Sparse4D v2: Recurrent Temporal Fusion with Sparse Model

05/23/2023
by   Xuewu Lin, et al.
0

Sparse algorithms offer great flexibility for multi-view temporal perception tasks. In this paper, we present an enhanced version of Sparse4D, in which we improve the temporal fusion module by implementing a recursive form of multi-frame feature sampling. By effectively decoupling image features and structured anchor features, Sparse4D enables a highly efficient transformation of temporal features, thereby facilitating temporal fusion solely through the frame-by-frame transmission of sparse features. The recurrent temporal fusion approach provides two main benefits. Firstly, it reduces the computational complexity of temporal fusion from O(T) to O(1), resulting in significant improvements in inference speed and memory usage. Secondly, it enables the fusion of long-term information, leading to more pronounced performance improvements due to temporal fusion. Our proposed approach, Sparse4Dv2, further enhances the performance of the sparse perception algorithm and achieves state-of-the-art results on the nuScenes 3D detection benchmark. Code will be available at <https://github.com/linxuewu/Sparse4D>.

READ FULL TEXT
research
03/21/2023

Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D Object Detection

In this paper, we propose a long-sequence modeling framework, named Stre...
research
10/05/2022

Time Will Tell: New Outlooks and A Baseline for Temporal Multi-View 3D Object Detection

While recent camera-only 3D detection methods leverage multiple timestep...
research
11/19/2022

Sparse4D: Multi-view 3D Object Detection with Sparse Spatial-Temporal Fusion

Bird-eye-view (BEV) based methods have made great progress recently in m...
research
08/21/2023

UnLoc: A Unified Framework for Video Localization Tasks

While large-scale image-text pretrained models such as CLIP have been us...
research
08/08/2023

CheXFusion: Effective Fusion of Multi-View Features using Transformers for Long-Tailed Chest X-Ray Classification

Medical image classification poses unique challenges due to the long-tai...
research
07/12/2022

M-FUSE: Multi-frame Fusion for Scene Flow Estimation

Recently, neural network for scene flow estimation show impressive resul...
research
08/04/2021

Recursive Fusion and Deformable Spatiotemporal Attention for Video Compression Artifact Reduction

A number of deep learning based algorithms have been proposed to recover...

Please sign up or login with your details

Forgot password? Click here to reset