Self-supervised similarity search for large scientific datasets

10/25/2021
by   George Stein, et al.
0

We present the use of self-supervised learning to explore and exploit large unlabeled datasets. Focusing on 42 million galaxy images from the latest data release of the Dark Energy Spectroscopic Instrument (DESI) Legacy Imaging Surveys, we first train a self-supervised model to distil low-dimensional representations that are robust to symmetries, uncertainties, and noise in each image. We then use the representations to construct and publicly release an interactive semantic similarity search tool. We demonstrate how our tool can be used to rapidly discover rare objects given only a single example, increase the speed of crowd-sourcing campaigns, and construct and improve training sets for supervised applications. While we focus on images from sky surveys, the technique is straightforward to apply to any scientific dataset of any dimensionality. The similarity search web app can be found at https://github.com/georgestein/galaxy_search

READ FULL TEXT

page 3

page 4

research
09/30/2021

Mining for strong gravitational lenses with self-supervised learning

We employ self-supervised representation learning to distill information...
research
06/15/2022

A Simple Data Mixing Prior for Improving Self-Supervised Learning

Data mixing (e.g., Mixup, Cutmix, ResizeMix) is an essential component f...
research
11/28/2022

Interactive Visual Feature Search

Many visualization techniques have been created to help explain the beha...
research
07/28/2021

Fast and Scalable Image Search For Histology

The expanding adoption of digital pathology has enabled the curation of ...
research
11/23/2021

DABS: A Domain-Agnostic Benchmark for Self-Supervised Learning

Self-supervised learning algorithms, including BERT and SimCLR, have ena...
research
09/14/2022

Self-Supervised Clustering on Image-Subtracted Data with Deep-Embedded Self-Organizing Map

Developing an effective automatic classifier to separate genuine sources...
research
05/31/2022

Glo-In-One: Holistic Glomerular Detection, Segmentation, and Lesion Characterization with Large-scale Web Image Mining

The quantitative detection, segmentation, and characterization of glomer...

Please sign up or login with your details

Forgot password? Click here to reset