When can I Speak? Predicting initiation points for spoken dialogue agents

08/07/2022
by   Siyan Li, et al.
0

Current spoken dialogue systems initiate their turns after a long period of silence (700-1000ms), which leads to little real-time feedback, sluggish responses, and an overall stilted conversational flow. Humans typically respond within 200ms and successfully predicting initiation points in advance would allow spoken dialogue agents to do the same. In this work, we predict the lead-time to initiation using prosodic features from a pre-trained speech representation model (wav2vec 1.0) operating on user audio and word features from a pre-trained language model (GPT-2) operating on incremental transcriptions. To evaluate errors, we propose two metrics w.r.t. predicted and true lead times. We train and evaluate the models on the Switchboard Corpus and find that our method outperforms features from prior work on both metrics and vastly outperforms the common approach of waiting for 700ms of silence.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/26/2021

Language Model as an Annotator: Exploring DialoGPT for Dialogue Summarization

Current dialogue summarization systems usually encode the text with a nu...
research
10/21/2021

Modeling Performance in Open-Domain Dialogue with PARADISE

There has recently been an explosion of work on spoken dialogue systems,...
research
09/20/2023

Towards Joint Modeling of Dialogue Response and Speech Synthesis based on Large Language Model

This paper explores the potential of constructing an AI spoken dialogue ...
research
10/08/2018

Multi-Task Learning for Domain-General Spoken Disfluency Detection in Dialogue Systems

Spontaneous spoken dialogue is often disfluent, containing pauses, hesit...
research
11/23/2022

Device Directedness with Contextual Cues for Spoken Dialog Systems

In this work, we define barge-in verification as a supervised learning t...
research
02/28/2022

Probing the Robustness of Trained Metrics for Conversational Dialogue Systems

This paper introduces an adversarial method to stress-test trained metri...
research
09/14/2019

Current Challenges in Spoken Dialogue Systems and Why They Are Critical for Those Living with Dementia

Dialogue technologies such as Amazon's Alexa have the potential to trans...

Please sign up or login with your details

Forgot password? Click here to reset