Mutual Information Based Method for Unsupervised Disentanglement of Video Representation

by   P Aditya Sreekar, et al.

Video Prediction is an interesting and challenging task of predicting future frames from a given set context frames that belong to a video sequence. Video prediction models have found prospective applications in Maneuver Planning, Health care, Autonomous Navigation and Simulation. One of the major challenges in future frame generation is due to the high dimensional nature of visual data. In this work, we propose Mutual Information Predictive Auto-Encoder (MIPAE) framework, that reduces the task of predicting high dimensional video frames by factorising video representations into content and low dimensional pose latent variables that are easy to predict. A standard LSTM network is used to predict these low dimensional pose representations. Content and the predicted pose representations are decoded to generate future frames. Our approach leverages the temporal structure of the latent generative factors of a video and a novel mutual information loss to learn disentangled video representations. We also propose a metric based on mutual information gap (MIG) to quantitatively access the effectiveness of disentanglement on DSprites and MPI3D-real datasets. MIG scores corroborate with the visual superiority of frames predicted by MIPAE. We also compare our method quantitatively on evaluation metrics LPIPS, SSIM and PSNR.


page 1

page 5

page 6

page 7


Learning to Decompose and Disentangle Representations for Video Prediction

Our goal is to predict future video frames given a sequence of input fra...

Unsupervised Learning of Disentangled Representations from Video

We present a new model DrNET that learns disentangled image representati...

HSIC-InfoGAN: Learning Unsupervised Disentangled Representations by Maximising Approximated Mutual Information

Learning disentangled representations requires either supervision or the...

Efficient training for future video generation based on hierarchical disentangled representation of latent variables

Generating videos predicting the future of a given sequence has been an ...

Video Infringement Detection via Feature Disentanglement and Mutual Information Maximization

The self-media era provides us tremendous high quality videos. Unfortuna...

Action-conditioned Benchmarking of Robotic Video Prediction Models: a Comparative Study

A defining characteristic of intelligent systems is the ability to make ...

Unsupervised Scientific Abstract Segmentation with Normalized Mutual Information

The abstracts of scientific papers consist of premises and conclusions. ...

Please sign up or login with your details

Forgot password? Click here to reset