A Preliminary Study of a Two-Stage Paradigm for Preserving Speaker Identity in Dysarthric Voice Conversion

06/02/2021
by   Wen-Chin Huang, et al.
0

We propose a new paradigm for maintaining speaker identity in dysarthric voice conversion (DVC). The poor quality of dysarthric speech can be greatly improved by statistical VC, but as the normal speech utterances of a dysarthria patient are nearly impossible to collect, previous work failed to recover the individuality of the patient. In light of this, we suggest a novel, two-stage approach for DVC, which is highly flexible in that no normal speech of the patient is required. First, a powerful parallel sequence-to-sequence model converts the input dysarthric speech into a normal speech of a reference speaker as an intermediate product, and a nonparallel, frame-wise VC model realized with a variational autoencoder then converts the speaker identity of the reference speech back to that of the patient while assumed to be capable of preserving the enhanced quality. We investigate several design options. Experimental evaluation results demonstrate the potential of our approach to improving the quality of the dysarthric speech while maintaining the speaker identity.

READ FULL TEXT
research
10/15/2021

Towards Identity Preserving Normal to Dysarthric Voice Conversion

We present a voice conversion framework that converts normal speech into...
research
08/24/2018

Voice Conversion with Conditional SampleRNN

Here we present a novel approach to conditioning the SampleRNN generativ...
research
09/05/2023

Evaluating Methods for Ground-Truth-Free Foreign Accent Conversion

Foreign accent conversion (FAC) is a special application of voice conver...
research
10/30/2018

Generative Adversarial Networks for Unpaired Voice Transformation on Impaired Speech

This paper focuses on using voice conversion (VC) to improve the speech ...
research
07/31/2023

VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design

Single-stage text-to-speech models have been actively studied recently, ...
research
05/09/2023

Who is Speaking Actually? Robust and Versatile Speaker Traceability for Voice Conversion

Voice conversion (VC), as a voice style transfer technology, is becoming...
research
10/19/2022

Two-stage training method for Japanese electrolaryngeal speech enhancement based on sequence-to-sequence voice conversion

Sequence-to-sequence (seq2seq) voice conversion (VC) models have greater...

Please sign up or login with your details

Forgot password? Click here to reset