Lifter Training and Sub-band Modeling for Computationally Efficient and High-Quality Voice Conversion Using Spectral Differentials

02/17/2020
by   Takaaki Saeki, et al.
0

In this paper, we propose computationally efficient and high-quality methods for statistical voice conversion (VC) with direct waveform modification based on spectral differentials. The conventional method with a minimum-phase filter achieves high-quality conversion but requires heavy computation in filtering. This is because the minimum phase using a fixed lifter of the Hilbert transform often results in a long-tap filter. One of our methods is a data-driven method for lifter training. Since this method takes filter truncation into account in training, it can shorten the tap length of the filter while preserving conversion accuracy. Our other method is sub-band processing for extending the conventional method from narrow-band (16 kHz) to full-band (48 kHz) VC, which can convert a full-band waveform with higher converted-speech quality. Experimental results indicate that 1) the proposed lifter-training method for narrow-band VC can shorten the tap length to 1/16 without degrading the converted-speech quality and 2) the proposed sub-band-processing method for full-band VC can improve the converted-speech quality than the conventional method.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/25/2021

High-Quality Vocoding Design with Signal Processing for Speech Synthesis and Voice Conversion

This Ph.D. thesis focuses on developing a system for high-quality speech...
research
04/29/2021

Out-of-Band Power Reduction in NC-OFDM with Optimized Cancellation Carriers Selection

In this letter, we propose a computationally efficient method for joint ...
research
05/11/2020

Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech

In this paper, we propose multi-band MelGAN, a much faster waveform gene...
research
01/19/2018

Epoch-Synchronous Overlap-Add (ESOLA) for Time- and Pitch-Scale Modification of Speech Signals

Time- and pitch-scale modifications of speech signals find important app...
research
12/19/2020

Non-uniform FIR Digital Filter Bank for Hearing Aid Application Using Frequency Response Masking Technique: A Review

Hearing aid is an electroacoustic device used to selectively amplify the...
research
11/06/2017

Minimum-Phase HRTF Modeling of Pinna Spectral Notches using Group Delay Decomposition

Accurate reconstruction of HRTFs is important in the design and developm...
research
12/21/2017

On the Use of a Spectral Glottal Model for the Source-filter Separation of Speech

The estimation of glottal flow from a speech waveform is a key method fo...

Please sign up or login with your details

Forgot password? Click here to reset