Towards Effective and Compact Contextual Representation for Conformer Transducer Speech Recognition Systems

06/23/2023
by   Mingyu Cui, et al.
0

Current ASR systems are mainly trained and evaluated at the utterance level. Long range cross utterance context can be incorporated. A key task is to derive a suitable compact representation of the most relevant history contexts. In contrast to previous researches based on either LSTM-RNN encoded histories that attenuate the information from longer range contexts, or frame level concatenation of transformer context embeddings, in this paper compact low-dimensional cross utterance contextual features are learned in the Conformer-Transducer Encoder using specially designed attention pooling layers that are applied over efficiently cached preceding utterances history vectors. Experiments on the 1000-hr Gigaspeech corpus demonstrate that the proposed contextualized streaming Conformer-Transducers outperform the baseline using utterance internal context only with statistically significant WER reductions of 0.7

READ FULL TEXT
research
02/16/2021

Hierarchical Transformer-based Large-Context End-to-end ASR with Large-Context Knowledge Distillation

We present a novel large-context end-to-end automatic speech recognition...
research
07/02/2022

Improving Transformer-based Conversational ASR by Inter-Sentential Attention Mechanism

Transformer-based models have demonstrated their effectiveness in automa...
research
02/12/2021

Transformer Language Models with LSTM-based Cross-utterance Information Representation

The effective incorporation of cross-utterance information has the poten...
research
11/05/2021

Conversational speech recognition leveraging effective fusion methods for cross-utterance language modeling

Conversational speech normally is embodied with loose syntactic structur...
research
06/29/2023

Leveraging Cross-Utterance Context For ASR Decoding

While external language models (LMs) are often incorporated into the dec...
research
04/09/2019

HiGRU: Hierarchical Gated Recurrent Units for Utterance-level Emotion Recognition

In this paper, we address three challenges in utterance-level emotion re...
research
09/16/2020

How to marry a star: probabilistic constraints for meaning in context

In this paper, we derive a notion of word meaning in context from Fillmo...

Please sign up or login with your details

Forgot password? Click here to reset