A Benchmark of Dynamical Variational Autoencoders applied to Speech Spectrogram Modeling

06/11/2021
by   Xiaoyu Bie, et al.
0

The Variational Autoencoder (VAE) is a powerful deep generative model that is now extensively used to represent high-dimensional complex data via a low-dimensional latent space learned in an unsupervised manner. In the original VAE model, input data vectors are processed independently. In recent years, a series of papers have presented different extensions of the VAE to process sequential data, that not only model the latent space, but also model the temporal dependencies within a sequence of data vectors and corresponding latent vectors, relying on recurrent neural networks. We recently performed a comprehensive review of those models and unified them into a general class called Dynamical Variational Autoencoders (DVAEs). In the present paper, we present the results of an experimental benchmark comparing six of those DVAE models on the speech analysis-resynthesis task, as an illustration of the high potential of DVAEs for speech modeling.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/28/2020

Dynamical Variational Autoencoders: A Comprehensive Review

The Variational Autoencoder (VAE) is a powerful deep generative model th...
research
03/07/2023

Speech Modeling with a Hierarchical Transformer Dynamical VAE

The dynamical variational autoencoders (DVAEs) are a family of latent-va...
research
04/14/2022

Learning and controlling the source-filter representation of speech with a variational autoencoder

Understanding and controlling latent representations in deep generative ...
research
05/05/2023

A Multimodal Dynamical Variational Autoencoder for Audiovisual Speech Representation Learning

In this paper, we present a multimodal and dynamical VAE (MDVAE) applied...
research
11/25/2018

Sequential Variational Autoencoders for Collaborative Filtering

Variational autoencoders were proven successful in domains such as compu...
research
10/04/2021

Assessing glaucoma in retinal fundus photographs using Deep Feature Consistent Variational Autoencoders

One of the leading causes of blindness is glaucoma, which is challenging...
research
07/20/2020

It's LeVAsa not LevioSA! Latent Encodings for Valence-Arousal Structure Alignment

In recent years, great strides have been made in the field of affective ...

Please sign up or login with your details

Forgot password? Click here to reset