Anyone GAN Sing

02/22/2021
by   Shreeviknesh Sankaran, et al.
0

The problem of audio synthesis has been increasingly solved using deep neural networks. With the introduction of Generative Adversarial Networks (GAN), another efficient and adjective path has opened up to solve this problem. In this paper, we present a method to synthesize the singing voice of a person using a Convolutional Long Short-term Memory (ConvLSTM) based GAN optimized using the Wasserstein loss function. Our work is inspired by WGANSing by Chandna et al. Our model inputs consecutive frame-wise linguistic and frequency features, along with singer identity and outputs vocoder features. We train the model on a dataset of 48 English songs sung and spoken by 12 non-professional singers. For inference, sequential blocks are concatenated using an overlap-add procedure. We test the model using the Mel-Cepstral Distance metric and a subjective listening test with 18 participants.

READ FULL TEXT

page 3

page 4

research
03/26/2019

WGANSing: A Multi-Voice Singing Voice Synthesizer Based on the Wasserstein-GAN

We present a deep neural network based singing voice synthesizer, inspir...
research
05/02/2018

Text to Image Synthesis Using Generative Adversarial Networks

Generating images from natural language is one of the primary applicatio...
research
09/13/2019

Spoken Speech Enhancement using EEG

In this paper we demonstrate spoken speech enhancement using electroence...
research
11/27/2017

HP-GAN: Probabilistic 3D human motion prediction via GAN

Predicting and understanding human motion dynamics has many applications...
research
03/06/2019

Conditional GANs For Painting Generation

We examined the use of modern Generative Adversarial Nets to generate no...
research
01/09/2017

AdaGAN: Boosting Generative Models

Generative Adversarial Networks (GAN) (Goodfellow et al., 2014) are an e...

Please sign up or login with your details

Forgot password? Click here to reset