On Statistical Bias In Active Learning: How and When To Fix It

by   Sebastian Farquhar, et al.

Active learning is a powerful tool when labelling data is expensive, but it introduces a bias because the training data no longer follows the population distribution. We formalize this bias and investigate the situations in which it can be harmful and sometimes even helpful. We further introduce novel corrective weights to remove bias when doing so is beneficial. Through this, our work not only provides a useful mechanism that can improve the active learning approach, but also an explanation of the empirical successes of various existing approaches which ignore this bias. In particular, we show that this bias can be actively helpful when training overparameterized models – like neural networks – with relatively little data.


page 1

page 2

page 3

page 4


Characterizing the robustness of Bayesian adaptive experimental designs to active learning bias

Bayesian adaptive experimental design is a form of active learning, whic...

Addressing Bias in Active Learning with Depth Uncertainty Networks... or Not

Farquhar et al. [2021] show that correcting for active learning bias wit...

Interpretable Active Learning

Active learning has long been a topic of study in machine learning. Howe...

Depth Uncertainty Networks for Active Learning

In active learning, the size and complexity of the training dataset chan...

Active Testing: Sample-Efficient Model Evaluation

We introduce active testing: a new framework for sample-efficient model ...

Mitigating sampling bias in risk-based active learning via an EM algorithm

Risk-based active learning is an approach to developing statistical clas...

Disambiguation of Company names via Deep Recurrent Networks

Name Entity Disambiguation is the Natural Language Processing task of id...

Please sign up or login with your details

Forgot password? Click here to reset