Unsupervised and interpretable scene discovery with Discrete-Attend-Infer-Repeat

by   Duo Wang, et al.

In this work we present Discrete Attend Infer Repeat (Discrete-AIR), a Recurrent Auto-Encoder with structured latent distributions containing discrete categorical distributions, continuous attribute distributions, and factorised spatial attention. While inspired by the original AIR model andretaining AIR model's capability in identifying objects in an image, Discrete-AIR provides direct interpretability of the latent codes. We show that for Multi-MNIST and a multiple-objects version of dSprites dataset, the Discrete-AIR model needs just one categorical latent variable, one attribute variable (for Multi-MNIST only), together with spatial attention variables, for efficient inference. We perform analysis to show that the learnt categorical distributions effectively capture the categories of objects in the scene for Multi-MNIST and for Multi-Sprites.


page 4

page 6

page 7

page 8


Diffusion Probabilistic Models for Scene-Scale 3D Categorical Data

In this paper, we learn a diffusion model to generate 3D data on a scene...

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

We present Sequential Attend, Infer, Repeat (SQAIR), an interpretable de...

Joint-VAE: Learning Disentangled Joint Continuous and Discrete Representations

We present a framework for learning disentangled and interpretable joint...

Identifying Interpretable Discrete Latent Structures from Discrete Data

High dimensional categorical data are routinely collected in biomedical ...

Latent Gaussian Processes for Distribution Estimation of Multivariate Categorical Data

Multivariate categorical data occur in many applications of machine lear...

Sparse Communication via Mixed Distributions

Neural networks and other machine learning models compute continuous rep...

Please sign up or login with your details

Forgot password? Click here to reset