Making Binary Classification from Multiple Unlabeled Datasets Almost Free of Supervision

06/12/2023
by   Yuhao Wu, et al.
0

Training a classifier exploiting a huge amount of supervised data is expensive or even prohibited in a situation, where the labeling cost is high. The remarkable progress in working with weaker forms of supervision is binary classification from multiple unlabeled datasets which requires the knowledge of exact class priors for all unlabeled datasets. However, the availability of class priors is restrictive in many real-world scenarios. To address this issue, we propose to solve a new problem setting, i.e., binary classification from multiple unlabeled datasets with only one pairwise numerical relationship of class priors (MU-OPPO), which knows the relative order (which unlabeled dataset has a higher proportion of positive examples) of two class-prior probabilities for two datasets among multiple unlabeled datasets. In MU-OPPO, we do not need the class priors for all unlabeled datasets, but we only require that there exists a pair of unlabeled datasets for which we know which unlabeled dataset has a larger class prior. Clearly, this form of supervision is easier to be obtained, which can make labeling costs almost free. We propose a novel framework to handle the MU-OPPO problem, which consists of four sequential modules: (i) pseudo label assignment; (ii) confident example collection; (iii) class prior estimation; (iv) classifier training with estimated class priors. Theoretically, we analyze the gap between estimated class priors and true class priors under the proposed framework. Empirically, we confirm the superiority of our framework with comprehensive experiments. Experimental results demonstrate that our framework brings smaller estimation errors of class priors and better performance of binary classification.

READ FULL TEXT

page 6

page 7

page 19

research
02/01/2021

Binary Classification from Multiple Unlabeled Datasets via Surrogate Set Classification

To cope with high annotation costs, training a classifier only from weak...
research
07/11/2021

Positive-Unlabeled Classification under Class-Prior Shift: A Prior-invariant Approach Based on Density Ratio Estimation

Learning from positive and unlabeled (PU) data is an important problem i...
research
09/19/2018

Positive-Unlabeled Classification under Class Prior Shift and Asymmetric Error

A bottleneck of binary classification from positive and unlabeled data (...
research
07/04/2022

Learning from Multiple Unlabeled Datasets with Partial Risk Regularization

Recent years have witnessed a great success of supervised deep learning,...
research
06/11/2020

Similarity-based Classification: Connecting Similarity Learning to Binary Classification

In real-world classification problems, pairwise supervision (i.e., a pai...
research
10/22/2020

Classification with Rejection Based on Cost-sensitive Classification

The goal of classification with rejection is to avoid risky misclassific...
research
06/22/2021

The Hitchhiker's Guide to Prior-Shift Adaptation

In many computer vision classification tasks, class priors at test time ...

Please sign up or login with your details

Forgot password? Click here to reset