High-Fidelity Audio Generation and Representation Learning with Guided Adversarial Autoencoder

by   Kazi Nazmul Haque, et al.

Unsupervised disentangled representation learning from the unlabelled audio data, and high fidelity audio generation have become two linchpins in the machine learning research fields. However, the representation learned from an unsupervised setting does not guarantee its' usability for any downstream task at hand, which can be a wastage of the resources, if the training was conducted for that particular posterior job. Also, during the representation learning, if the model is highly biased towards the downstream task, it losses its generalisation capability which directly benefits the downstream job but the ability to scale it to other related task is lost. Therefore, to fill this gap, we propose a new autoencoder based model named "Guided Adversarial Autoencoder (GAAE)", which can learn both post-task-specific representations and the general representation capturing the factors of variation in the training data leveraging a small percentage of labelled samples; thus, makes it suitable for future related tasks. Furthermore, our proposed model can generate audio with superior quality, which is indistinguishable from the real audio samples. Hence, with the extensive experimental results, we have demonstrated that by harnessing the power of the high-fidelity audio generation, the proposed GAAE model can learn powerful representation from unlabelled dataset leveraging a fewer percentage of labelled data as supervision/guidance.


page 1

page 9

page 10

page 11

page 13

page 20


High-Fidelity Synthesis with Disentangled Representation

Learning disentangled representation of data without supervision is an i...

Audio Dequantization for High Fidelity Audio Generation in Flow-based Neural Vocoder

In recent works, a flow-based neural vocoder has shown significant impro...

I Hear Your True Colors: Image Guided Audio Generation

We propose Im2Wav, an image guided open-domain audio generation system. ...

A Deep Learning-based Audio-in-Image Watermarking Scheme

This paper presents a deep learning-based audio-in-image watermarking sc...

Enhancing Unsupervised Audio Representation Learning via Adversarial Sample Generation

Existing audio analysis methods generally first transform the audio stre...

On Designing Good Representation Learning Models

The goal of representation learning is different from the ultimate objec...

Please sign up or login with your details

Forgot password? Click here to reset