Diverse and Accurate Image Description Using a Variational Auto-Encoder with an Additive Gaussian Encoding Space

11/19/2017
by   Liwei Wang, et al.
0

This paper explores image caption generation using conditional variational auto-encoders (CVAEs). Standard CVAEs with a fixed Gaussian prior yield descriptions with too little variability. Instead, we propose two models that explicitly structure the latent space around K components corresponding to different types of image content, and combine components to create priors for images that contain multiple types of content simultaneously (e.g., several kinds of objects). Our first model uses a Gaussian Mixture model (GMM) prior, while the second one defines a novel Additive Gaussian (AG) prior that linearly combines component means. We show that both models produce captions that are more diverse and more accurate than a strong LSTM baseline or a "vanilla" CVAE with a fixed Gaussian prior, with AG-CVAE showing particular promise.

READ FULL TEXT

page 2

page 7

page 8

research
06/16/2019

Fixing Gaussian Mixture VAEs for Interpretable Text Generation

Variational auto-encoder (VAE) with Gaussian priors is effective in text...
research
04/06/2020

Variational auto-encoders with Student's t-prior

We propose a new structure for the variational auto-encoders (VAEs) prio...
research
11/25/2019

Improving VAE generations of multimodal data through data-dependent conditional priors

One of the major shortcomings of variational autoencoders is the inabili...
research
03/03/2019

Variational Auto-Decoder

Learning a generative model from partial data (data with missingness) is...
research
12/22/2018

Disentangling Latent Space for VAE by Label Relevant/Irrelevant Dimensions

VAE requires the standard Gaussian distribution as a prior in the latent...
research
12/06/2018

Disentangling Disentanglement

We develop a generalised notion of disentanglement in Variational Auto-E...
research
06/14/2019

Learning Correlated Latent Representations with Adaptive Priors

Variational Auto-Encoders (VAEs) have been widely applied for learning c...

Please sign up or login with your details

Forgot password? Click here to reset