Timbre Transfer with Variational Auto Encoding and Cycle-Consistent Adversarial Networks

by   Russell Sammut Bonnici, et al.

This research project investigates the application of deep learning to timbre transfer, where the timbre of a source audio can be converted to the timbre of a target audio with minimal loss in quality. The adopted approach combines Variational Autoencoders with Generative Adversarial Networks to construct meaningful representations of the source audio and produce realistic generations of the target audio and is applied to the Flickr 8k Audio dataset for transferring the vocal timbre between speakers and the URMP dataset for transferring the musical timbre between instruments. Furthermore, variations of the adopted approach are trained, and generalised performance is compared using the metrics SSIM (Structural Similarity Index) and FAD (Frechét Audio Distance). It was found that a many-to-many approach supersedes a one-to-one approach in terms of reconstructive capabilities, and that the adoption of a basic over a bottleneck residual block design is more suitable for enriching content information about a latent space. It was also found that the decision on whether cyclic loss takes on a variational autoencoder or vanilla autoencoder approach does not have a significant impact on reconstructive and adversarial translation aspects of the model.


page 11

page 12

page 15

page 16

page 17

page 18

page 19

page 20


RAVE: A variational autoencoder for fast and high-quality neural audio synthesis

Deep generative models applied to audio have improved by a large margin ...

Modulated Variational auto-Encoders for many-to-many musical timbre transfer

Generative models have been successfully applied to image style transfer...

Introducing Latent Timbre Synthesis

We present the Latent Timbre Synthesis (LTS), a new audio synthesis meth...

VaPar Synth – A Variational Parametric Model for Audio Synthesis

With the advent of data-driven statistical modeling and abundant computi...

Mic2Mic: Using Cycle-Consistent Generative Adversarial Networks to Overcome Microphone Variability in Speech Systems

Mobile and embedded devices are increasingly using microphones and audio...

HpRNet : Incorporating Residual Noise Modeling for Violin in a Variational Parametric Synthesizer

Generative Models for Audio Synthesis have been gaining momentum in the ...

Latent Vector Recovery of Audio GANs

Advanced Generative Adversarial Networks (GANs) are remarkable in genera...

Please sign up or login with your details

Forgot password? Click here to reset