VAE-CE: Visual Contrastive Explanation using Disentangled VAEs
The goal of a classification model is to assign the correct labels to data. In most cases, this data is not fully described by the given set of labels. Often a rich set of meaningful concepts exist in the domain that can much more precisely describe each datapoint. Such concepts can also be highly useful for interpreting the model's classifications. In this paper we propose a model, denoted as Variational Autoencoder-based Contrastive Explanation (VAE-CE), that represents data with high-level concepts and uses this representation for both classification and generating explanations. The explanations are produced in a contrastive manner, conveying why a datapoint is assigned to one class rather than an alternative class. An explanation is specified as a set of transformations of the input datapoint, with each step depicting a concept changing towards the contrastive class. We build the model using a disentangled VAE, extended with a new supervised method for disentangling individual dimensions. An analysis on synthetic data and MNIST shows that the approaches to both disentanglement and explanation provide benefits over other methods.
READ FULL TEXT