RETRIEVE: Coreset Selection for Efficient and Robust Semi-Supervised Learning

06/14/2021
by   KrishnaTeja Killamsetty, et al.
11

Semi-supervised learning (SSL) algorithms have had great success in recent years in limited labeled data regimes. However, the current state-of-the-art SSL algorithms are computationally expensive and entail significant compute time and energy requirements. This can prove to be a huge limitation for many smaller companies and academic groups. Our main insight is that training on a subset of unlabeled data instead of entire unlabeled data enables the current SSL algorithms to converge faster, thereby reducing the computational costs significantly. In this work, we propose RETRIEVE, a coreset selection framework for efficient and robust semi-supervised learning. RETRIEVE selects the coreset by solving a mixed discrete-continuous bi-level optimization problem such that the selected coreset minimizes the labeled set loss. We use a one-step gradient approximation and show that the discrete optimization problem is approximately submodular, thereby enabling simple greedy algorithms to obtain the coreset. We empirically demonstrate on several real-world datasets that existing SSL algorithms like VAT, Mean-Teacher, FixMatch, when used with RETRIEVE, achieve a) faster training times, b) better performance when unlabeled data consists of Out-of-Distribution(OOD) data and imbalance. More specifically, we show that with minimal accuracy degradation, RETRIEVE achieves a speedup of around 3X in the traditional SSL setting and achieves a speedup of 5X compared to state-of-the-art (SOTA) robust SSL algorithms in the case of imbalance and OOD data.

READ FULL TEXT
research
02/17/2020

Class-Imbalanced Semi-Supervised Learning

Semi-Supervised Learning (SSL) has achieved great success in overcoming ...
research
12/18/2019

RealMix: Towards Realistic Semi-Supervised Deep Learning Algorithms

Semi-Supervised Learning (SSL) algorithms have shown great potential in ...
research
09/09/2021

FedCon: A Contrastive Framework for Federated Semi-Supervised Learning

Federated Semi-Supervised Learning (FedSSL) has gained rising attention ...
research
12/19/2020

GLISTER: Generalization based Data Subset Selection for Efficient and Robust Learning

Large scale machine learning and deep models are extremely data-hungry. ...
research
11/20/2022

An Embarrassingly Simple Baseline for Imbalanced Semi-Supervised Learning

Semi-supervised learning (SSL) has shown great promise in leveraging unl...
research
05/04/2023

High-dimensional Bayesian Optimization via Semi-supervised Learning with Optimized Unlabeled Data Sampling

Bayesian optimization (BO) is a powerful tool for seeking the global opt...
research
10/07/2020

Robust Semi-Supervised Learning with Out of Distribution Data

Semi-supervised learning (SSL) based on deep neural networks (DNNs) has ...

Please sign up or login with your details

Forgot password? Click here to reset