CSP: Self-Supervised Contrastive Spatial Pre-Training for Geospatial-Visual Representations

by   Gengchen Mai, et al.

Geo-tagged images are publicly available in large quantities, whereas labels such as object classes are rather scarce and expensive to collect. Meanwhile, contrastive learning has achieved tremendous success in various natural image and language tasks with limited labeled data. However, existing methods fail to fully leverage geospatial information, which can be paramount to distinguishing objects that are visually similar. To directly leverage the abundant geospatial information associated with images in pre-training, fine-tuning, and inference stages, we present Contrastive Spatial Pre-Training (CSP), a self-supervised learning framework for geo-tagged images. We use a dual-encoder to separately encode the images and their corresponding geo-locations, and use contrastive objectives to learn effective location representations from images, which can be transferred to downstream supervised tasks such as image classification. Experiments show that CSP can improve model performance on both iNat2018 and fMoW datasets. Especially, on iNat2018, CSP significantly boosts the model performance with 10-34 sampling ratios.


page 2

page 12

page 15


Recovering Petaflops in Contrastive Semi-Supervised Learning of Visual Representations

We investigate a strategy for improving the computational efficiency of ...

Graph Contrastive Learning for Multi-omics Data

Advancements in technologies related to working with omics data require ...

Contrastive Object-level Pre-training with Spatial Noise Curriculum Learning

The goal of contrastive learning based pre-training is to leverage large...

IMG2IMU: Applying Knowledge from Large-Scale Images to IMU Applications via Contrastive Learning

Recent advances in machine learning showed that pre-training representat...

Is Self-Supervised Learning More Robust Than Supervised Learning?

Self-supervised contrastive learning is a powerful tool to learn visual ...

Exploring Self-Supervised Representation Ensembles for COVID-19 Cough Classification

The usage of smartphone-collected respiratory sound, trained with deep l...

Exploring Visual Interpretability for Contrastive Language-Image Pre-training

Contrastive Language-Image pre-training (CLIP) learns rich representatio...

Please sign up or login with your details

Forgot password? Click here to reset