Modality Confidence Aware Training for Robust End-to-End Spoken Language Understanding

07/22/2023
by   Suyoun Kim, et al.
0

End-to-end (E2E) spoken language understanding (SLU) systems that generate a semantic parse from speech have become more promising recently. This approach uses a single model that utilizes audio and text representations from pre-trained speech recognition models (ASR), and outperforms traditional pipeline SLU systems in on-device streaming scenarios. However, E2E SLU systems still show weakness when text representation quality is low due to ASR transcription errors. To overcome this issue, we propose a novel E2E SLU system that enhances robustness to ASR errors by fusing audio and text representations based on the estimated modality confidence of ASR hypotheses. We introduce two novel techniques: 1) an effective method to encode the quality of ASR hypotheses and 2) an effective approach to integrate them into E2E SLU models. We show accuracy improvements on STOP dataset and share the analysis to demonstrate the effectiveness of our approach.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/04/2022

Deliberation Model for On-Device Spoken Language Understanding

We propose a novel deliberation-based approach to end-to-end (E2E) spoke...
research
10/27/2022

Token-level Sequence Labeling for Spoken Language Understanding using Compositional End-to-End Models

End-to-end spoken language understanding (SLU) systems are gaining popul...
research
04/13/2021

Bridging the Gap Between Clean Data Training and Real-World Inference for Spoken Language Understanding

Spoken language understanding (SLU) system usually consists of various p...
research
04/22/2022

WaBERT: A Low-resource End-to-end Model for Spoken Language Understanding and Speech-to-BERT Alignment

Historically lower-level tasks such as automatic speech recognition (ASR...
research
04/07/2021

Speak or Chat with Me: End-to-End Spoken Language Understanding System with Flexible Inputs

A major focus of recent research in spoken language understanding (SLU) ...
research
06/18/2019

Curriculum-based transfer learning for an effective end-to-end spoken language understanding and domain portability

We present an end-to-end approach to extract semantic concepts directly ...
research
11/08/2020

Listen, Look and Deliberate: Visual context-aware speech recognition using pre-trained text-video representations

In this study, we try to address the problem of leveraging visual signal...

Please sign up or login with your details

Forgot password? Click here to reset