Learning multi-modal generative models with permutation-invariant encoders and tighter variational bounds

09/01/2023
by   Marcel Hirt, et al.
0

Devising deep latent variable models for multi-modal data has been a long-standing theme in machine learning research. Multi-modal Variational Autoencoders (VAEs) have been a popular generative model class that learns latent representations which jointly explain multiple modalities. Various objective functions for such models have been suggested, often motivated as lower bounds on the multi-modal data log-likelihood or from information-theoretic considerations. In order to encode latent variables from different modality subsets, Product-of-Experts (PoE) or Mixture-of-Experts (MoE) aggregation schemes have been routinely used and shown to yield different trade-offs, for instance, regarding their generative quality or consistency across multiple modalities. In this work, we consider a variational bound that can tightly lower bound the data log-likelihood. We develop more flexible aggregation schemes that generalise PoE or MoE approaches by combining encoded features from different modalities based on permutation-invariant neural networks. Our numerical experiments illustrate trade-offs for multi-modal variational bounds and various aggregation schemes. We show that tighter variational bounds and more flexible aggregation models can become beneficial when one wants to approximate the true joint distribution over observed modalities and latent variables in identifiable models.

READ FULL TEXT

page 38

page 39

page 40

page 41

research
11/08/2019

Variational Mixture-of-Experts Autoencoders for Multi-Modal Deep Generative Models

Learning generative models that span multiple data modalities, such as v...
research
11/01/2017

An Information-Theoretic Analysis of Deep Latent-Variable Models

We present an information-theoretic framework for understanding trade-of...
research
10/20/2020

Variational Dynamic Mixtures

Deep probabilistic time series forecasting models have become an integra...
research
03/18/2019

M^2VAE - Derivation of a Multi-Modal Variational Autoencoder Objective from the Marginal Joint Log-Likelihood

This work gives an in-depth derivation of the trainable evidence lower b...
research
05/18/2018

GumBolt: Extending Gumbel trick to Boltzmann priors

Boltzmann machines (BMs) are appealing candidates for powerful priors in...
research
06/07/2023

Multi-modal Latent Diffusion

Multi-modal data-sets are ubiquitous in modern applications, and multi-m...
research
06/28/2019

Benefits of Overparameterization in Single-Layer Latent Variable Generative Models

One of the most surprising and exciting discoveries in supervising learn...

Please sign up or login with your details

Forgot password? Click here to reset