Augmenting text for spoken language understanding with Large Language Models

09/17/2023
by   Roshan Sharma, et al.
0

Spoken semantic parsing (SSP) involves generating machine-comprehensible parses from input speech. Training robust models for existing application domains represented in training data or extending to new domains requires corresponding triplets of speech-transcript-semantic parse data, which is expensive to obtain. In this paper, we address this challenge by examining methods that can use transcript-semantic parse data (unpaired text) without corresponding speech. First, when unpaired text is drawn from existing textual corpora, Joint Audio Text (JAT) and Text-to-Speech (TTS) are compared as ways to generate speech representations for unpaired text. Experiments on the STOP dataset show that unpaired text from existing and new domains improves performance by 2 consider the setting when unpaired text is not available in existing textual corpora. We propose to prompt Large Language Models (LLMs) to generate unpaired text for existing and new domains. Experiments show that examples and words that co-occur with intents can be used to generate unpaired text with Llama 2.0. Using the generated text with JAT and TTS for spoken semantic parsing improves EM on STOP by 1.4 respectively.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/05/2022

ASR-Generated Text for Language Model Pre-training Applied to Speech Tasks

We aim at improving spoken language modeling (LM) using very large amoun...
research
07/06/2020

Contextualized Spoken Word Representations from Convolutional Autoencoders

A lot of work has been done recently to build sound language models for ...
research
05/24/2023

LMs with a Voice: Spoken Language Modeling beyond Speech Tokens

We present SPECTRON, a novel approach to adapting pre-trained language m...
research
10/08/2020

On the Role of Style in Parsing Speech with Neural Models

The differences in written text and conversational speech are substantia...
research
05/22/2023

Zero-Shot End-to-End Spoken Language Understanding via Cross-Modal Selective Self-Training

End-to-end (E2E) spoken language understanding (SLU) is constrained by t...
research
05/22/2023

Textually Pretrained Speech Language Models

Speech language models (SpeechLMs) process and generate acoustic data on...
research
06/02/2023

BabySLM: language-acquisition-friendly benchmark of self-supervised spoken language models

Self-supervised techniques for learning speech representations have been...

Please sign up or login with your details

Forgot password? Click here to reset