Efficient Self-supervised Vision Pretraining with Local Masked Reconstruction

06/01/2022
by   Jun Chen, et al.
0

Self-supervised learning for computer vision has achieved tremendous progress and improved many downstream vision tasks such as image classification, semantic segmentation, and object detection. Among these, generative self-supervised vision learning approaches such as MAE and BEiT show promising performance. However, their global masked reconstruction mechanism is computationally demanding. To address this issue, we propose local masked reconstruction (LoMaR), a simple yet effective approach that performs masked reconstruction within a small window of 7×7 patches on a simple Transformer encoder, improving the trade-off between efficiency and accuracy compared to global masked reconstruction over the entire image. Extensive experiments show that LoMaR reaches 84.1 classification, outperforming MAE by 0.5 LoMaR on 384×384 images, it can reach 85.4 MAE by 0.6 object detection and 0.5 AP^mask on instance segmentation. LoMaR is especially more computation-efficient on pretraining high-resolution images, e.g., it is 3.1× faster than MAE with 0.2 accuracy on pretraining 448×448 images. This local masked reconstruction learning mechanism can be easily integrated into any other generative self-supervised learning approach. Our code will be publicly available.

READ FULL TEXT

page 2

page 3

page 9

page 14

page 15

research
05/11/2022

An Empirical Study Of Self-supervised Learning Approaches For Object Detection With Transformers

Self-supervised learning (SSL) methods such as masked language modeling ...
research
03/12/2023

Improving Masked Autoencoders by Learning Where to Mask

Masked image modeling is a promising self-supervised learning method for...
research
07/12/2023

Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution

The ubiquitous and demonstrably suboptimal choice of resizing images to ...
research
07/17/2023

Does Visual Pretraining Help End-to-End Reasoning?

We aim to investigate whether end-to-end learning of visual reasoning ca...
research
04/17/2023

Self-Supervised Learning from Non-Object Centric Images with a Geometric Transformation Sensitive Architecture

Most invariance-based self-supervised methods rely on single object-cent...
research
08/22/2023

Masked Momentum Contrastive Learning for Zero-shot Semantic Understanding

Self-supervised pretraining (SSP) has emerged as a popular technique in ...
research
11/30/2021

MC-SSL0.0: Towards Multi-Concept Self-Supervised Learning

Self-supervised pretraining is the method of choice for natural language...

Please sign up or login with your details

Forgot password? Click here to reset