Semi-Supervised Speech Recognition via Graph-based Temporal Classification

10/29/2020
by   Niko Moritz, et al.
0

Semi-supervised learning has demonstrated promising results in automatic speech recognition (ASR) by self-training using a seed ASR model with pseudo-labels generated for unlabeled data. The effectiveness of this approach largely relies on the pseudo-label accuracy, for which typically only the 1-best ASR hypothesis is used. However, alternative ASR hypotheses of an N-best list can provide more accurate labels for an unlabeled speech utterance and also reflect uncertainties of the seed ASR model. In this paper, we propose a generalized form of the connectionist temporal classification (CTC) objective that accepts a graph representation of the training targets. The newly proposed graph-based temporal classification (GTC) objective is applied for self-training with WFST-based supervision, which is generated from an N-best list of pseudo-labels. In this setup, GTC is used to learn not only a temporal alignment, similarly to CTC, but also a label alignment to obtain the optimal pseudo-label sequence from the weighted graph. Results show that this approach can effectively exploit an N-best list of pseudo-labels with associated scores, outperforming standard pseudo-labeling by a large margin, with ASR results close to an oracle experiment in which the best hypotheses of the N-best lists are selected manually.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/01/2022

Extended Graph Temporal Classification for Multi-Speaker End-to-End ASR

Graph-based temporal classification (GTC), a generalized form of the con...
research
06/16/2021

Momentum Pseudo-Labeling for Semi-Supervised Speech Recognition

Pseudo-labeling (PL) has been shown to be effective in semi-supervised a...
research
04/20/2018

Graph-based Hypothesis Generation for Parallax-tolerant Image Stitching

The seam-driven approach has been proven fairly effective for parallax-t...
research
10/11/2021

Advancing Momentum Pseudo-Labeling with Conformer and Initialization Strategy

Pseudo-labeling (PL), a semi-supervised learning (SSL) method where a se...
research
08/08/2019

Exploiting semi-supervised training through a dropout regularization in end-to-end speech recognition

In this paper, we explore various approaches for semi supervised learnin...
research
11/01/2021

Sequence Transduction with Graph-based Supervision

The recurrent neural network transducer (RNN-T) objective plays a major ...
research
11/02/2022

InterMPL: Momentum Pseudo-Labeling with Intermediate CTC Loss

This paper presents InterMPL, a semi-supervised learning method of end-t...

Please sign up or login with your details

Forgot password? Click here to reset