Bidirectional Representations for Low Resource Spoken Language Understanding

11/24/2022
by   Quentin Meeus, et al.
0

Most spoken language understanding systems use a pipeline approach composed of an automatic speech recognition interface and a natural language understanding module. This approach forces hard decisions when converting continuous inputs into discrete language symbols. Instead, we propose a representation model to encode speech in rich bidirectional encodings that can be used for downstream tasks such as intent prediction. The approach uses a masked language modelling objective to learn the representations, and thus benefits from both the left and right contexts. We show that the performance of the resulting encodings before fine-tuning is better than comparable models on multiple datasets, and that fine-tuning the top layers of the representation model improves the current state of the art on the Fluent Speech Command dataset, also in a low-data regime, when a limited amount of labelled data is used for training. Furthermore, we propose class attention as a spoken language understanding module, efficient both in terms of speed and number of parameters. Class attention can be used to visually explain the predictions of our model, which goes a long way in understanding how the model makes predictions. We perform experiments in English and in Dutch.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/11/2020

Improving Spoken Language Understanding By Exploiting ASR N-best Hypotheses

In a modern spoken language understanding (SLU) system, the natural lang...
research
08/13/2020

Large-scale Transfer Learning for Low-resource Spoken Language Understanding

End-to-end Spoken Language Understanding (SLU) models are made increasin...
research
06/28/2022

Bottleneck Low-rank Transformers for Low-resource Spoken Language Understanding

End-to-end spoken language understanding (SLU) systems benefit from pret...
research
07/03/2023

Semantic enrichment towards efficient speech representations

Over the past few years, self-supervised learned speech representations ...
research
10/06/2020

Textual Supervision for Visually Grounded Spoken Language Understanding

Visually-grounded models of spoken language understanding extract semant...
research
08/03/2021

Learning a Neural Diff for Speech Models

As more speech processing applications execute locally on edge devices, ...
research
08/08/2020

Deep F-measure Maximization for End-to-End Speech Understanding

Spoken language understanding (SLU) datasets, like many other machine le...

Please sign up or login with your details

Forgot password? Click here to reset