DDSP-based Singing Vocoders: A New Subtractive-based Synthesizer and A Comprehensive Evaluation

08/09/2022
by   Da-Yi Wu, et al.
0

A vocoder is a conditional audio generation model that converts acoustic features such as mel-spectrograms into waveforms. Taking inspiration from Differentiable Digital Signal Processing (DDSP), we propose a new vocoder named SawSing for singing voices. SawSing synthesizes the harmonic part of singing voices by filtering a sawtooth source signal with a linear time-variant finite impulse response filter whose coefficients are estimated from the input mel-spectrogram by a neural network. As this approach enforces phase continuity, SawSing can generate singing voices without the phase-discontinuity glitch of many existing vocoders. Moreover, the source-filter assumption provides an inductive bias that allows SawSing to be trained on a small amount of data. Our experiments show that SawSing converges much faster and outperforms state-of-the-art generative adversarial network and diffusion-based vocoders in a resource-limited scenario with only 3 training recordings and a 3-hour training time.

READ FULL TEXT

page 1

page 6

research
04/26/2023

Source-Filter-Based Generative Adversarial Neural Vocoder for High Fidelity Speech Synthesis

This paper proposes a source-filter-based generative adversarial neural ...
research
06/19/2023

Vocal Timbre Effects with Differentiable Digital Signal Processing

We explore two approaches to creatively altering vocal timbre using Diff...
research
08/27/2019

Neural Harmonic-plus-Noise Waveform Model with Trainable Maximum Voice Frequency for Text-to-Speech Synthesis

Neural source-filter (NSF) models are deep neural networks that produce ...
research
06/29/2023

Singing Voice Synthesis Using Differentiable LPC and Glottal-Flow-Inspired Wavetables

This paper introduces GlOttal-flow LPC Filter (GOLF), a novel method for...
research
04/06/2018

Generative adversarial network-based approach to signal reconstruction from magnitude spectrograms

In this paper, we address the problem of reconstructing a time-domain si...
research
06/01/2023

Differentiable Allpass Filters for Phase Response Estimation and Automatic Signal Alignment

Virtual analog (VA) audio effects are increasingly based on neural netwo...
research
01/14/2020

DDSP: Differentiable Digital Signal Processing

Most generative models of audio directly generate samples in one of two ...

Please sign up or login with your details

Forgot password? Click here to reset