Bayes Risk Transducer: Transducer with Controllable Alignment Prediction

08/19/2023
by   Jinchuan Tian, et al.
0

Automatic speech recognition (ASR) based on transducers is widely used. In training, a transducer maximizes the summed posteriors of all paths. The path with the highest posterior is commonly defined as the predicted alignment between the speech and the transcription. While the vanilla transducer does not have a prior preference for any of the valid paths, this work intends to enforce the preferred paths and achieve controllable alignment prediction. Specifically, this work proposes Bayes Risk Transducer (BRT), which uses a Bayes risk function to set lower risk values to the preferred paths so that the predicted alignment is more likely to satisfy specific desired properties. We further demonstrate that these predicted alignments with intentionally designed properties can provide practical advantages over the vanilla transducer. Experimentally, the proposed BRT saves inference cost by up to 46 non-streaming ASR and reduces overall system latency by 41

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/14/2022

Bayes risk CTC: Controllable CTC alignment in Sequence-to-Sequence tasks

Sequence-to-Sequence (seq2seq) tasks transcribe the input sequence to a ...
research
10/21/2020

FastEmit: Low-latency Streaming ASR with Sequence-level Emission Regularization

Streaming automatic speech recognition (ASR) aims to emit each hypothesi...
research
10/18/2022

HMM vs. CTC for Automatic Speech Recognition: Comparison Based on Full-Sum Training from Scratch

In this work, we compare from-scratch sequence-level cross-entropy (full...
research
02/16/2022

Conversational Speech Recognition By Learning Conversation-level Characteristics

Conversational automatic speech recognition (ASR) is a task to recognize...
research
04/21/2021

Label-Synchronous Speech-to-Text Alignment for ASR Using Forward and Backward Transformers

This paper proposes a novel label-synchronous speech-to-text alignment t...
research
10/11/2022

CTC Alignments Improve Autoregressive Translation

Connectionist Temporal Classification (CTC) is a widely used approach fo...

Please sign up or login with your details

Forgot password? Click here to reset