Voice Activity Projection: Self-supervised Learning of Turn-taking Events

05/19/2022
by   Erik Ekstedt, et al.
0

The modeling of turn-taking in dialog can be viewed as the modeling of the dynamics of voice activity of the interlocutors. We extend prior work and define the predictive task of Voice Activity Projection, a general, self-supervised objective, as a way to train turn-taking models without the need of labeled data. We highlight a theoretical weakness with prior approaches, arguing for the need of modeling the dependency of voice activity events in the projection window. We propose four zero-shot tasks, related to the prediction of upcoming turn-shifts and backchannels, and show that the proposed model outperforms prior work.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/21/2020

TurnGPT: a Transformer-based Language Model for Predicting Turn-taking in Spoken Dialog

Syntactic and pragmatic completeness is known to be important for turn-t...
research
10/27/2021

Zero-shot Voice Conversion via Self-supervised Prosody Representation Learning

Voice Conversion (VC) for unseen speakers, also known as zero-shot VC, i...
research
05/03/2023

What makes a good pause? Investigating the turn-holding effects of fillers

Filled pauses (or fillers), such as "uh" and "um", are frequent in spont...
research
06/22/2023

Toward Leveraging Pre-Trained Self-Supervised Frontends for Automatic Singing Voice Understanding Tasks: Three Case Studies

Automatic singing voice understanding tasks, such as singer identificati...
research
09/23/2021

Simple and Effective Zero-shot Cross-lingual Phoneme Recognition

Recent progress in self-training, self-supervised pretraining and unsupe...
research
10/27/2021

Neural Analysis and Synthesis: Reconstructing Speech from Self-Supervised Representations

We present a neural analysis and synthesis (NANSY) framework that can ma...
research
05/29/2023

Automatic Evaluation of Turn-taking Cues in Conversational Speech Synthesis

Turn-taking is a fundamental aspect of human communication where speaker...

Please sign up or login with your details

Forgot password? Click here to reset