ImagiFilter: A resource to enable the semi-automatic mining of images at scale

08/20/2020
by   Houda Alberts, et al.
0

Datasets (semi-)automatically collected from the web can easily scale to millions of entries, but a dataset's usefulness is directly related to how clean and high-quality its examples are. In this paper, we describe and publicly release an image dataset along with pretrained models designed to (semi-)automatically filter out undesirable images from very large image collections, possibly obtained from the web. Our dataset focusses on photographic and/or natural images, a very common use-case in computer vision research. We provide annotations for coarse prediction, i.e. photographic vs. non-photographic, and smaller fine-grained prediction tasks where we further break down the non-photographic class into five classes: maps, drawings, graphs, icons, and sketches. Results on held out validation data show that a model architecture with reduced memory footprint achieves over 96 coarse-prediction. Our best model achieves 88 fine-grained classification task available. Dataset and pretrained models are available at: https://github.com/houda96/imagi-filter.

READ FULL TEXT
research
06/02/2021

The Semi-Supervised iNaturalist Challenge at the FGVC8 Workshop

Semi-iNat is a challenging dataset for semi-supervised classification wi...
research
08/05/2021

Webly Supervised Fine-Grained Recognition: Benchmark Datasets and An Approach

Learning from the web can ease the extreme dependence of deep learning o...
research
05/27/2020

NDD20: A large-scale few-shot dolphin dataset for coarse and fine-grained categorisation

We introduce the Northumberland Dolphin Dataset 2020 (NDD20), a challeng...
research
11/26/2020

Fine-Grained Re-Identification

Research into the task of re-identification (ReID) is picking up momentu...
research
03/11/2021

The Semi-Supervised iNaturalist-Aves Challenge at FGVC7 Workshop

This document describes the details and the motivation behind a new data...
research
02/10/2021

Towards More Fine-grained and Reliable NLP Performance Prediction

Performance prediction, the task of estimating a system's performance wi...
research
07/05/2023

Line Graphics Digitization: A Step Towards Full Automation

The digitization of documents allows for wider accessibility and reprodu...

Please sign up or login with your details

Forgot password? Click here to reset