What Makes ImageNet Look Unlike LAION

06/27/2023
by   Ali Shirali, et al.
1

ImageNet was famously created from Flickr image search results. What if we recreated ImageNet instead by searching the massive LAION dataset based on image captions alone? In this work, we carry out this counterfactual investigation. We find that the resulting ImageNet recreation, which we call LAIONet, looks distinctly unlike the original. Specifically, the intra-class similarity of images in the original ImageNet is dramatically higher than it is for LAIONet. Consequently, models trained on ImageNet perform significantly worse on LAIONet. We propose a rigorous explanation for the discrepancy in terms of a subtle, yet important, difference in two plausible causal data-generating processes for the respective datasets, that we support with systematic experimentation. In a nutshell, searching based on an image caption alone creates an information bottleneck that mitigates the selection bias otherwise present in image-based filtering. Our explanation formalizes a long-held intuition in the community that ImageNet images are stereotypical, unnatural, and overly simple representations of the class category. At the same time, it provides a simple and actionable takeaway for future dataset creation efforts.

READ FULL TEXT
research
07/27/2017

A Downsampled Variant of ImageNet as an Alternative to the CIFAR datasets

The original ImageNet dataset is a popular large-scale benchmark for tra...
research
12/16/2022

Fake it till you make it: Learning(s) from a synthetic ImageNet clone

Recent large-scale image generation models such as Stable Diffusion have...
research
04/23/2019

DenseNet Models for Tiny ImageNet Classification

In this paper, we present two image classification models on the Tiny Im...
research
05/30/2023

LANCE: Stress-testing Visual Models by Generating Language-guided Counterfactual Images

We propose an automated algorithm to stress-test a trained visual model ...
research
01/21/2023

Counterfactual Explanation and Instance-Generation using Cycle-Consistent Generative Adversarial Networks

The image-based diagnosis is now a vital aspect of modern automation ass...
research
06/12/2020

Are we done with ImageNet?

Yes, and no. We ask whether recent progress on the ImageNet classificati...
research
11/23/2021

CytoImageNet: A large-scale pretraining dataset for bioimage transfer learning

Motivation: In recent years, image-based biological assays have steadily...

Please sign up or login with your details

Forgot password? Click here to reset