Active Learning of Non-semantic Speech Tasks with Pretrained Models

10/31/2022
by   Harlin Lee, et al.
0

Pretraining neural networks with massive unlabeled datasets has become popular as it equips the deep models with a better prior to solve downstream tasks. However, this approach generally assumes that for downstream tasks, we have access to annotated data of sufficient size. In this work, we propose ALOE, a novel system for improving the data- and label-efficiency of non-semantic speech tasks with active learning (AL). ALOE uses pre-trained models in conjunction with active learning to label data incrementally and learns classifiers for downstream tasks, thereby mitigating the need to acquire labeled data beforehand. We demonstrate the effectiveness of ALOE on a wide range of tasks, uncertainty-based acquisition functions, and model architectures. Training a linear classifier on top of a frozen encoder with ALOE is shown to achieve performance similar to several baselines that utilize the entire labeled data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/07/2022

AfroLM: A Self-Active Learning-based Multilingual Pretrained Language Model for 23 African Languages

In recent years, multilingual pre-trained language models have gained pr...
research
04/16/2021

Bayesian Active Learning with Pretrained Language Models

Active Learning (AL) is a method to iteratively select data for annotati...
research
04/18/2022

Active Learning Helps Pretrained Models Learn the Intended Task

Models can fail in unpredictable ways during deployment due to task ambi...
research
09/27/2021

BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition

We summarize the results of a host of efforts using giant automatic spee...
research
03/02/2023

On the Provable Advantage of Unsupervised Pretraining

Unsupervised pretraining, which learns a useful representation using a l...
research
04/28/2023

Speak, Memory: An Archaeology of Books Known to ChatGPT/GPT-4

In this work, we carry out a data archaeology to infer books that are kn...
research
01/26/2022

TrustAL: Trustworthy Active Learning using Knowledge Distillation

Active learning can be defined as iterations of data labeling, model tra...

Please sign up or login with your details

Forgot password? Click here to reset