Topic Modeling with Wasserstein Autoencoders

07/24/2019
by   Feng Nan, et al.
0

We propose a novel neural topic model in the Wasserstein autoencoders (WAE) framework. Unlike existing variational autoencoder based models, we directly enforce Dirichlet prior on the latent document-topic vectors. We exploit the structure of the latent space and apply a suitable kernel in minimizing the Maximum Mean Discrepancy (MMD) to perform distribution matching. We discover that MMD performs much better than the Generative Adversarial Network (GAN) in matching high dimensional Dirichlet distribution. We further discover that incorporating randomness in the encoder output during training leads to significantly more coherent topics. To measure the diversity of the produced topics, we propose a simple topic uniqueness metric. Together with the widely used coherence measure NPMI, we offer a more wholistic evaluation of topic quality. Experiments on several real datasets show that our model produces significantly better topics than existing topic models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/29/2020

Neural Topic Modeling with Cycle-Consistent Adversarial Training

Advances on deep generative models have attracted significant research i...
research
04/05/2018

Sliced-Wasserstein Autoencoder: An Embarrassingly Simple Generative Model

In this paper we study generative modeling via autoencoders while using ...
research
05/25/2023

Diversity-Aware Coherence Loss for Improving Neural Topic Models

The standard approach for neural topic modeling uses a variational autoe...
research
06/08/2023

A modified model for topic detection from a corpus and a new metric evaluating the understandability of topics

This paper presents a modified neural model for topic detection from a c...
research
11/30/2017

Feature discovery and visualization of robot mission data using convolutional autoencoders and Bayesian nonparametric topic models

The gap between our ability to collect interesting data and our ability ...
research
03/27/2023

Improving Contextualized Topic Models with Negative Sampling

Topic modeling has emerged as a dominant method for exploring large docu...
research
05/04/2022

Seed-Guided Topic Discovery with Out-of-Vocabulary Seeds

Discovering latent topics from text corpora has been studied for decades...

Please sign up or login with your details

Forgot password? Click here to reset