Guided Generative Adversarial Neural Network for Representation Learning and High Fidelity Audio Generation using Fewer Labelled Audio Data

by   Kazi Nazmul Haque, et al.

Recent improvements in Generative Adversarial Neural Networks (GANs) have shown their ability to generate higher quality samples as well as to learn good representations for transfer learning. Most of the representation learning methods based on GANs learn representations ignoring their post-use scenario, which can lead to increased generalisation ability. However, the model can become redundant if it is intended for a specific task. For example, assume we have a vast unlabelled audio dataset, and we want to learn a representation from this dataset so that it can be used to improve the emotion recognition performance of a small labelled audio dataset. During the representation learning training, if the model does not know the post emotion recognition task, it can completely ignore emotion-related characteristics in the learnt representation. This is a fundamental challenge for any unsupervised representation learning model. In this paper, we aim to address this challenge by proposing a novel GAN framework: Guided Generative Neural Network (GGAN), which guides a GAN to focus on learning desired representations and generating superior quality samples for audio data leveraging fewer labelled samples. Experimental results show that using a very small amount of labelled data as guidance, a GGAN learns significantly better representations.


page 1

page 7


High-Fidelity Audio Generation and Representation Learning with Guided Adversarial Autoencoder

Unsupervised disentangled representation learning from the unlabelled au...

Augmenting Generative Adversarial Networks for Speech Emotion Recognition

Generative adversarial networks (GANs) have shown potential in learning ...

Self-supervised Graphs for Audio Representation Learning with Limited Labeled Data

Large scale databases with high-quality manual annotations are scarce in...

On the Limits of Learning Representations with Label-Based Supervision

Advances in neural network based classifiers have transformed automatic ...

OMG Emotion Challenge - ExCouple Team

The proposed model is only for the audio module. All videos in the OMG E...

On Enhancing Speech Emotion Recognition using Generative Adversarial Networks

Generative Adversarial Networks (GANs) have gained a lot of attention fr...

Generative Adversarial Networks for Multimodal Representation Learning in Video Hyperlinking

Continuous multimodal representations suitable for multimodal informatio...

Please sign up or login with your details

Forgot password? Click here to reset