SpeechPainter: Text-conditioned Speech Inpainting

02/15/2022
by   Zalán Borsos, et al.
0

We propose SpeechPainter, a model for filling in gaps of up to one second in speech samples by leveraging an auxiliary textual input. We demonstrate that the model performs speech inpainting with the appropriate content, while maintaining speaker identity, prosody and recording environment conditions, and generalizing to unseen speakers. Our approach significantly outperforms baselines constructed using adaptive TTS, as judged by human raters in side-by-side preference and MOS tests.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/17/2022

Any-speaker Adaptive Text-To-Speech Synthesis with Diffusion Models

There has been a significant progress in Text-To-Speech (TTS) synthesis ...
research
08/09/2020

Speaker Conditional WaveRNN: Towards Universal Neural Vocoder for Unseen Speaker and Recording Conditions

Recent advancements in deep learning led to human-level performance in s...
research
08/04/2021

Daft-Exprt: Robust Prosody Transfer Across Speakers for Expressive Speech Synthesis

This paper presents Daft-Exprt, a multi-speaker acoustic model advancing...
research
05/18/2020

Attentron: Few-Shot Text-to-Speech Utilizing Attention-Based Variable-Length Embedding

On account of growing demands for personalization, the need for a so-cal...
research
04/07/2022

Correcting Misproducted Speech using Spectrogram Inpainting

Learning a new language involves constantly comparing speech productions...
research
11/15/2018

Robust universal neural vocoding

This paper introduces a robust universal neural vocoder trained with 74 ...

Please sign up or login with your details

Forgot password? Click here to reset