Self-Supervised Learning from Non-Object Centric Images with a Geometric Transformation Sensitive Architecture

04/17/2023

∙

Most invariance-based self-supervised methods rely on single object-centric images (e.g., ImageNet images) for pretraining, learning invariant representations from geometric transformations. However, when images are not object-centric, the semantics of the image can be significantly altered due to geometric transformations such as random crops and multi-crops. Furthermore, the model may struggle to capture location information. For this reason, we propose a Geometric Transformation Sensitive Architecture that learns features sensitive to geometric transformation like four-fold rotation, random crop, and multi-crop. Our method encourages the student to learn sensitive features by increasing the similarity between overlapping regions not entire views. and applying rotations to the target feature map. Additionally, we use a patch correspondence loss to capture long-term dependencies. Our approach demonstrates improved performance when using non-object-centric images as pretraining data compared to other methods that learn geometric transformation-invariant representations. We surpass DINO baseline in tasks such as image classification, semantic segmentation, detection, and instance segmentation with improvements of 6.1 Acc, 0.6 mIoU, 0.4 AP^b, and 0.1 AP^m.

READ FULL TEXT

Self-Supervised Learning from Non-Object Centric Images with a Geometric Transformation Sensitive Architecture

Spatially Consistent Representation Learning

Refine and Represent: Region-to-Object Representation Learning

Efficient Self-supervised Vision Pretraining with Local Masked Reconstruction

Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision

Coarse Is Better? A New Pipeline Towards Self-Supervised Learning with Uncurated Images

TransformNet: Self-supervised representation learning through predicting geometric transformations

Refactoring Policy for Compositional Generalizability using Self-Supervised Object Proposals

Self-Supervised Learning from Non-Object Centric Images with a Geometric Transformation Sensitive Architecture

Related Research

Spatially Consistent Representation Learning

Refine and Represent: Region-to-Object Representation Learning

Efficient Self-supervised Vision Pretraining with Local Masked Reconstruction

Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision

Coarse Is Better? A New Pipeline Towards Self-Supervised Learning with Uncurated Images

TransformNet: Self-supervised representation learning through predicting geometric transformations

Refactoring Policy for Compositional Generalizability using Self-Supervised Object Proposals