Are labels informative in semi-supervised learning? – Estimating and leveraging the missing-data mechanism

02/15/2023
by   Aude Sportisse, et al.
0

Semi-supervised learning is a powerful technique for leveraging unlabeled data to improve machine learning models, but it can be affected by the presence of “informative” labels, which occur when some classes are more likely to be labeled than others. In the missing data literature, such labels are called missing not at random. In this paper, we propose a novel approach to address this issue by estimating the missing-data mechanism and using inverse propensity weighting to debias any SSL algorithm, including those using data augmentation. We also propose a likelihood ratio test to assess whether or not labels are indeed informative. Finally, we demonstrate the performance of the proposed methods on different datasets, in particular on two medical datasets for which we design pseudo-realistic missing data scenarios.

READ FULL TEXT
research
04/18/2012

Semi-Supervised learning with Density-Ratio Estimation

In this paper, we study statistical properties of semi-supervised learni...
research
04/05/2019

On missing label patterns in semi-supervised learning

We investigate model based classification with partially labelled traini...
research
11/27/2018

Robust Semi-Supervised Learning when Labels are Missing at Random

Semi-supervised learning methods are motivated by the relative paucity o...
research
07/18/2022

Deeply-Learned Generalized Linear Models with Missing Data

Deep Learning (DL) methods have dramatically increased in popularity in ...
research
03/28/2018

Semi-supervised learning for structured regression on partially observed attributed graphs

Conditional probabilistic graphical models provide a powerful framework ...
research
05/15/2022

Imputations for High Missing Rate Data in Covariates via Semi-supervised Learning Approach

Advancements in data collection techniques and the heterogeneity of data...
research
10/11/2022

Combining datasets to increase the number of samples and improve model fitting

For many use cases, combining information from different datasets can be...

Please sign up or login with your details

Forgot password? Click here to reset