BERT, can HE predict contrastive focus? Predicting and controlling prominence in neural TTS using a language model

07/04/2022
by   Brooke Stephenson, et al.
3

Several recent studies have tested the use of transformer language model representations to infer prosodic features for text-to-speech synthesis (TTS). While these studies have explored prosody in general, in this work, we look specifically at the prediction of contrastive focus on personal pronouns. This is a particularly challenging task as it often requires semantic, discursive and/or pragmatic knowledge to predict correctly. We collect a corpus of utterances containing contrastive focus and we evaluate the accuracy of a BERT model, finetuned to predict quantized acoustic prominence features, on these samples. We also investigate how past utterances can provide relevant information for this prediction. Furthermore, we evaluate the controllability of pronoun prominence in a TTS model conditioned on acoustic prominence features.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 5

research
07/05/2023

Emoji Prediction using Transformer Models

In recent years, the use of emojis in social media has increased dramati...
research
04/29/2021

The Interspeech Zero Resource Speech Challenge 2021: Spoken language modelling

We present the Zero Resource Speech Challenge 2021, which asks participa...
research
04/06/2022

SOMOS: The Samsung Open MOS Dataset for the Evaluation of Neural Text-to-Speech Synthesis

In this work, we present the SOMOS dataset, the first large-scale mean o...
research
04/15/2019

Semantic query-by-example speech search using visual grounding

A number of recent studies have started to investigate how speech system...
research
10/25/2022

Contrastive Search Is What You Need For Neural Text Generation

Generating text with autoregressive language models (LMs) is of great im...
research
08/07/2023

A Hybrid CNN-Transformer Architecture with Frequency Domain Contrastive Learning for Image Deraining

Image deraining is a challenging task that involves restoring degraded i...
research
09/29/2020

Improving Device Directedness Classification of Utterances with Semantic Lexical Features

User interactions with personal assistants like Alexa, Google Home and S...

Please sign up or login with your details

Forgot password? Click here to reset