Statistical and Algorithmic Insights for Semi-supervised Learning with Self-training

06/19/2020
by   Samet Oymak, et al.
0

Self-training is a classical approach in semi-supervised learning which is successfully applied to a variety of machine learning problems. Self-training algorithm generates pseudo-labels for the unlabeled examples and progressively refines these pseudo-labels which hopefully coincides with the actual labels. This work provides theoretical insights into self-training algorithm with a focus on linear classifiers. We first investigate Gaussian mixture models and provide a sharp non-asymptotic finite-sample characterization of the self-training iterations. Our analysis reveals the provable benefits of rejecting samples with low confidence and demonstrates that self-training iterations gracefully improve the model accuracy even if they do get stuck in sub-optimal fixed points. We then demonstrate that regularization and class margin (i.e. separation) is provably important for the success and lack of regularization may prevent self-training from identifying the core features in the data. Finally, we discuss statistical aspects of empirical risk minimization with self-training for general distributions. We show how a purely unsupervised notion of generalization based on self-training based clustering can be formalized based on cluster margin. We then establish a connection between self-training based semi-supervision and the more general problem of learning with heterogenous data and weak supervision.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/17/2022

Contrastive Regularization for Semi-Supervised Learning

Consistency regularization on label predictions becomes a fundamental te...
research
05/16/2022

Sharp Asymptotics of Self-training with Linear Classifier

Self-training (ST) is a straightforward and standard approach in semi-su...
research
01/10/2023

Neighborhood-Regularized Self-Training for Learning with Few Labels

Training deep neural networks (DNNs) with limited supervision has been a...
research
06/25/2021

Self-training Converts Weak Learners to Strong Learners in Mixture Models

We consider a binary classification problem when the data comes from a m...
research
05/30/2022

Conformal Credal Self-Supervised Learning

In semi-supervised learning, the paradigm of self-training refers to the...
research
02/24/2022

Self-Training: A Survey

In recent years, semi-supervised algorithms have received a lot of inter...
research
10/03/2021

Information-Theoretic Generalization Bounds for Iterative Semi-Supervised Learning

We consider iterative semi-supervised learning (SSL) algorithms that ite...

Please sign up or login with your details

Forgot password? Click here to reset