Integrating Pretrained ASR and LM to Perform Sequence Generation for Spoken Language Understanding

07/20/2023
by   Siddhant Arora, et al.
0

There has been an increased interest in the integration of pretrained speech recognition (ASR) and language models (LM) into the SLU framework. However, prior methods often struggle with a vocabulary mismatch between pretrained models, and LM cannot be directly utilized as they diverge from its NLU formulation. In this study, we propose a three-pass end-to-end (E2E) SLU system that effectively integrates ASR and LM subnetworks into the SLU formulation for sequence generation tasks. In the first pass, our architecture predicts ASR transcripts using the ASR subnetwork. This is followed by the LM subnetwork, which makes an initial SLU prediction. Finally, in the third pass, the deliberation subnetwork conditions on representations from the ASR and LM subnetworks to make the final prediction. Our proposed three-pass SLU system shows improved performance over cascaded and E2E SLU models on two benchmark SLU datasets, SLURP and SLUE, especially on acoustically challenging utterances.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/04/2022

Deliberation Model for On-Device Spoken Language Understanding

We propose a novel deliberation-based approach to end-to-end (E2E) spoke...
research
05/02/2023

A Study on the Integration of Pipeline and E2E SLU systems for Spoken Semantic Parsing toward STOP Quality Challenge

Recently there have been efforts to introduce new benchmark tasks for sp...
research
10/27/2022

Token-level Sequence Labeling for Spoken Language Understanding using Compositional End-to-End Models

End-to-end spoken language understanding (SLU) systems are gaining popul...
research
02/16/2022

Knowledge Transfer from Large-scale Pretrained Language Models to End-to-end Speech Recognizers

End-to-end speech recognition is a promising technology for enabling com...
research
07/13/2023

Leveraging Pretrained ASR Encoders for Effective and Efficient End-to-End Speech Intent Classification and Slot Filling

We study speech intent classification and slot filling (SICSF) by propos...
research
11/28/2022

E2E Segmentation in a Two-Pass Cascaded Encoder ASR Model

We explore unifying a neural segmenter with two-pass cascaded encoder AS...
research
02/02/2022

RescoreBERT: Discriminative Speech Recognition Rescoring with BERT

Second-pass rescoring is an important component in automatic speech reco...

Please sign up or login with your details

Forgot password? Click here to reset