Variable Selection for Latent Dirichlet Allocation

05/04/2012
by   Dongwoo Kim, et al.
0

In latent Dirichlet allocation (LDA), topics are multinomial distributions over the entire vocabulary. However, the vocabulary usually contains many words that are not relevant in forming the topics. We adopt a variable selection method widely used in statistical modeling as a dimension reduction tool and combine it with LDA. In this variable selection model for LDA (vsLDA), topics are multinomial distributions over a subset of the vocabulary, and by excluding words that are not informative for finding the latent topic structure of the corpus, vsLDA finds topics that are more robust and discriminative. We compare three models, vsLDA, LDA with symmetric priors, and LDA with asymmetric priors, on heldout likelihood, MCMC chain consistency, and document classification. The performance of vsLDA is better than symmetric LDA for likelihood and classification, better than asymmetric LDA for consistency and classification, and about the same in the other comparisons.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/22/2014

Parsimonious Topic Models with Salient Word Discovery

We propose a parsimonious topic model for text corpora. In related model...
research
01/15/2020

VSEC-LDA: Boosting Topic Modeling with Embedded Vocabulary Selection

Topic modeling has found wide application in many problems where latent ...
research
06/05/2020

Topic Detection from Conversational Dialogue Corpus with Parallel Dirichlet Allocation Model and Elbow Method

A conversational system needs to know how to switch between topics to co...
research
09/24/2019

Diachronic Topics in New High German Poetry

Statistical topic models are increasingly and popularly used by Digital ...
research
06/23/2022

A Temporal Extension of Latent Dirichlet Allocation for Unsupervised Acoustic Unit Discovery

Latent Dirichlet allocation (LDA) is widely used for unsupervised topic ...
research
08/12/2010

Discovering shared and individual latent structure in multiple time series

This paper proposes a nonparametric Bayesian method for exploratory data...
research
08/04/2020

The Exact Asymptotic Form of Bayesian Generalization Error in Latent Dirichlet Allocation

Latent Dirichlet allocation (LDA) obtains essential information from dat...

Please sign up or login with your details

Forgot password? Click here to reset