Active Learning for Speech Recognition: the Power of Gradients

12/10/2016
by   Jiaji Huang, et al.
0

In training speech recognition systems, labeling audio clips can be expensive, and not all data is equally valuable. Active learning aims to label only the most informative samples to reduce cost. For speech recognition, confidence scores and other likelihood-based active learning methods have been shown to be effective. Gradient-based active learning methods, however, are still not well-understood. This work investigates the Expected Gradient Length (EGL) approach in active learning for end-to-end speech recognition. We justify EGL from a variance reduction perspective, and observe that EGL's measure of informativeness picks novel samples uncorrelated with confidence scores. Experimentally, we show that EGL can reduce word errors by 11%, or alternatively, reduce the number of samples to label by 50%, when compared to random sampling.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/11/2019

Parting with Illusions about Deep Active Learning

Active learning aims to reduce the high labeling cost involved in traini...
research
03/24/2020

A Data-Efficient Sampling Method for Estimating Basins of Attraction Using Hybrid Active Learning (HAL)

Although basins of attraction (BoA) diagrams are an insightful tool for ...
research
06/19/2020

Efficient Active Learning for Automatic Speech Recognition via Augmented Consistency Regularization

The cost of labeling transcriptions for large speech corpora becomes a b...
research
02/25/2019

Interpreting Active Learning Methods Through Information Losses

We propose a new way of interpreting active learning methods by analyzin...
research
05/26/2022

Active Labeling: Streaming Stochastic Gradients

The workhorse of machine learning is stochastic gradient descent. To acc...
research
05/15/2020

Stopping criterion for active learning based on deterministic generalization bounds

Active learning is a framework in which the learning machine can select ...
research
11/12/2020

Medical symptom recognition from patient text: An active learning approach for long-tailed multilabel distributions

We study the problem of medical symptoms recognition from patient text, ...

Please sign up or login with your details

Forgot password? Click here to reset