Sequence Transduction with Graph-based Supervision

11/01/2021
by   Niko Moritz, et al.
0

The recurrent neural network transducer (RNN-T) objective plays a major role in building today's best automatic speech recognition (ASR) systems for production. Similarly to the connectionist temporal classification (CTC) objective, the RNN-T loss uses specific rules that define how a set of alignments is generated to form a lattice for the full-sum training. However, it is yet largely unknown if these rules are optimal and do lead to the best possible ASR results. In this work, we present a new transducer objective function that generalizes the RNN-T loss to accept a graph representation of the labels, thus providing a flexible and efficient framework to manipulate training lattices, for example for restricting alignments or studying different transition rules. We demonstrate that transducer-based ASR with CTC-like lattice achieves better results compared to standard RNN-T, while also ensuring a strictly monotonic alignment, which will allow better optimization of the decoding procedure. For example, the proposed CTC-like transducer system achieves a word error rate of 5.9 corresponding to an improvement of 4.8 system.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/19/2022

An Investigation of Monotonic Transducers for Large-Scale Automatic Speech Recognition

The two most popular loss functions for streaming end-to-end automatic s...
research
07/29/2015

EESEN: End-to-End Speech Recognition using Deep RNN Models and WFST-based Decoding

The performance of automatic speech recognition (ASR) has improved treme...
research
10/29/2020

Semi-Supervised Speech Recognition via Graph-based Temporal Classification

Semi-supervised learning has demonstrated promising results in automatic...
research
11/05/2020

Alignment Restricted Streaming Recurrent Neural Network Transducer

There is a growing interest in the speech community in developing Recurr...
research
03/18/2023

Powerful and Extensible WFST Framework for RNN-Transducer Losses

This paper presents a framework based on Weighted Finite-State Transduce...
research
07/26/2023

Say Goodbye to RNN-T Loss: A Novel CIF-based Transducer Architecture for Automatic Speech Recognition

RNN-T models are widely used in ASR, which rely on the RNN-T loss to ach...
research
08/03/2022

VQ-T: RNN Transducers using Vector-Quantized Prediction Network States

Beam search, which is the dominant ASR decoding algorithm for end-to-end...

Please sign up or login with your details

Forgot password? Click here to reset