The Cells Out of Sample (COOS) dataset and benchmarks for measuring out-of-sample generalization of image classifiers

06/17/2019
by   Alex X. Lu, et al.
1

Understanding if classifiers generalize to out-of-sample datasets is a central problem in machine learning. Microscopy images provide a standardized way to measure the generalization capacity of image classifiers, as we can image the same classes of objects under increasingly divergent, but controlled factors of variation. We created a public dataset of 132,209 images of mouse cells, COOS-7 (Cells Out Of Sample 7-Class). COOS-7 provides a classification setting where four test datasets have increasing degrees of covariate shift: some images are random subsets of the training data, while others are from experiments reproduced months later and imaged by different instruments. We benchmarked a range of classification models using different representations, including transferred neural network features, end-to-end classification with a supervised deep CNN, and features from a self-supervised CNN. While most classifiers perform well on test datasets similar to the training dataset, all classifiers failed to generalize their performance to datasets with greater covariate shifts. These baselines highlight the challenges of covariate shifts in image data, and establish metrics for improving the generalization capacity of image classifiers.

READ FULL TEXT

page 3

page 9

page 10

page 11

research
06/10/2022

Memory Classifiers: Two-stage Classification for Robustness in Machine Learning

The performance of machine learning models can significantly degrade und...
research
06/06/2020

Self-Supervised Dynamic Networks for Covariate Shift Robustness

As supervised learning still dominates most AI applications, test-time p...
research
10/07/2021

FOCUS: Familiar Objects in Common and Uncommon Settings

Standard training datasets for deep learning often contain objects in co...
research
06/20/2023

Generalization Across Experimental Parameters in Machine Learning Analysis of High Resolution Transmission Electron Microscopy Datasets

Neural networks are promising tools for high-throughput and accurate tra...
research
10/19/2022

Training set cleansing of backdoor poisoning by self-supervised representation learning

A backdoor or Trojan attack is an important type of data poisoning attac...
research
05/22/2023

MAGDiff: Covariate Data Set Shift Detection via Activation Graphs of Deep Neural Networks

Despite their successful application to a variety of tasks, neural netwo...
research
10/10/2020

FIND: Human-in-the-Loop Debugging Deep Text Classifiers

Since obtaining a perfect training dataset (i.e., a dataset which is con...

Please sign up or login with your details

Forgot password? Click here to reset