SVT-Net: A Super Light-Weight Network for Large Scale Place Recognition using Sparse Voxel Transformers

by   Zhaoxin Fan, et al.
Renmin University of China
Nanjing University
Tsinghua University

Point cloud-based large scale place recognition is fundamental for many applications like Simultaneous Localization and Mapping (SLAM). Though previous methods have achieved good performance by learning short range local features, long range contextual properties have long been neglected. And model size has became a bottleneck for further popularizing. In this paper, we propose model SVTNet, a super light-weight network, for large scale place recognition. In our work, building on top of the highefficiency 3D Sparse Convolution (SP-Conv), an Atom-based Sparse Voxel Transformer (ASVT) and a Cluster-based Sparse Voxel Transformer (CSVT) are proposed to learn both short range local features and long range contextual features. Consisting of ASVT and CSVT, our SVT-Net can achieve state-of-art performance in terms of both accuracy and speed with a super-light model size (0.9M). Two simplified version of SVT-Net named ASVT-Net and CSVT-Net are also introduced, which also achieve state-of-art performances while further reduce the model size to 0.8M and 0.4M respectively.


page 1

page 2

page 3

page 4


Attentive Rotation Invariant Convolution for Point Cloud-based Large Scale Place Recognition

Autonomous Driving and Simultaneous Localization and Mapping(SLAM) are b...

HiTPR: Hierarchical Transformer for Place Recognition in Point Cloud

Place recognition or loop closure detection is one of the core component...

Contextual Attention Network: Transformer Meets U-Net

Currently, convolutional neural networks (CNN) (e.g., U-Net) have become...

TransLoc3D : Point Cloud based Large-scale Place Recognition using Adaptive Receptive Fields

Place recognition plays an essential role in the field of autonomous dri...

GIDP: Learning a Good Initialization and Inducing Descriptor Post-enhancing for Large-scale Place Recognition

Large-scale place recognition is a fundamental but challenging task, whi...

DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets

Designing an efficient yet deployment-friendly 3D backbone to handle spa...

Sparse Cross-scale Attention Network for Efficient LiDAR Panoptic Segmentation

Two major challenges of 3D LiDAR Panoptic Segmentation (PS) are that poi...

Please sign up or login with your details

Forgot password? Click here to reset