Active Learning from the Web

10/15/2022
by   Ryoma Sato, et al.
0

Labeling data is one of the most costly processes in machine learning pipelines. Active learning is a standard approach to alleviating this problem. Pool-based active learning first builds a pool of unlabelled data and iteratively selects data to be labeled so that the total number of required labels is minimized, keeping the model performance high. Many effective criteria for choosing data from the pool have been proposed in the literature. However, how to build the pool is less explored. Specifically, most of the methods assume that a task-specific pool is given for free. In this paper, we advocate that such a task-specific pool is not always available and propose the use of a myriad of unlabelled data on the Web for the pool for which active learning is applied. As the pool is extremely large, it is likely that relevant data exist in the pool for many tasks, and we do not need to explicitly design and build the pool for each task. The challenge is that we cannot compute the acquisition scores of all data exhaustively due to the size of the pool. We propose an efficient method, Seafaring, to retrieve informative data in terms of active learning from the Web using a user-side information retrieval algorithm. In the experiments, we use the online Flickr environment as the pool for active learning. This pool contains more than ten billion images and is several orders of magnitude larger than the existing pools in the literature for active learning. We confirm that our method performs better than existing approaches of using a small unlabelled pool.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/01/2019

Picking groups instead of samples: A close look at Static Pool-based Meta-Active Learning

Active Learning techniques are used to tackle learning problems where ob...
research
01/20/2020

Model Reuse with Reduced Kernel Mean Embedding Specification

Given a publicly available pool of machine learning models constructed f...
research
01/23/2023

Speeding Up BatchBALD: A k-BALD Family of Approximations for Active Learning

Active learning is a powerful method for training machine learning model...
research
11/08/2011

UPAL: Unbiased Pool Based Active Learning

In this paper we address the problem of pool based active learning, and ...
research
02/02/2016

Interactive algorithms: from pool to stream

We consider interactive algorithms in the pool-based setting, and in the...
research
09/11/2023

Stream-based Active Learning by Exploiting Temporal Properties in Perception with Temporal Predicted Loss

Active learning (AL) reduces the amount of labeled data needed to train ...
research
11/01/2022

Entity Matching by Pool-based Active Learning

The goal of entity matching is to find the corresponding records represe...

Please sign up or login with your details

Forgot password? Click here to reset