CropMix: Sampling a Rich Input Distribution via Multi-Scale Cropping

05/31/2022
by   Junlin Han, et al.
2

We present a simple method, CropMix, for the purpose of producing a rich input distribution from the original dataset distribution. Unlike single random cropping, which may inadvertently capture only limited information, or irrelevant information, like pure background, unrelated objects, etc, we crop an image multiple times using distinct crop scales, thereby ensuring that multi-scale information is captured. The new input distribution, serving as training data, useful for a number of vision tasks, is then formed by simply mixing multiple cropped views. We first demonstrate that CropMix can be seamlessly applied to virtually any training recipe and neural network architecture performing classification tasks. CropMix is shown to improve the performance of image classifiers on several benchmark tasks across-the-board without sacrificing computational simplicity and efficiency. Moreover, we show that CropMix is of benefit to both contrastive learning and masked image modeling towards more powerful representations, where preferable results are achieved when learned representations are transferred to downstream tasks. Code is available at GitHub.

READ FULL TEXT

page 2

page 4

page 17

page 18

research
01/28/2022

You Only Cut Once: Boosting Data Augmentation with a Single Cut

We present You Only Cut Once (YOCO) for performing data augmentations. Y...
research
06/01/2022

Multi-scale frequency separation network for image deblurring

Image deblurring aims to restore the detailed texture information or str...
research
05/27/2022

Multimodal Masked Autoencoders Learn Transferable Representations

Building scalable models to learn from diverse, multimodal data remains ...
research
09/08/2023

On the Efficacy of Multi-scale Data Samplers for Vision Applications

Multi-scale resolution training has seen an increased adoption across mu...
research
06/29/2023

An Efficient General-Purpose Modular Vision Model via Multi-Task Heterogeneous Training

We present a model that can perform multiple vision tasks and can be ada...
research
02/16/2020

Multi-Scale Representation Learning for Spatial Feature Distributions using Grid Cells

Unsupervised text encoding models have recently fueled substantial progr...
research
12/07/2016

Richer Convolutional Features for Edge Detection

In this paper, we propose an accurate edge detector using richer convolu...

Please sign up or login with your details

Forgot password? Click here to reset