Inference in topic models: sparsity and trade-off

12/10/2015
by   Khoat Than, et al.
0

Topic models are popular for modeling discrete data (e.g., texts, images, videos, links), and provide an efficient way to discover hidden structures/semantics in massive data. One of the core problems in this field is the posterior inference for individual data instances. This problem is particularly important in streaming environments, but is often intractable. In this paper, we investigate the use of the Frank-Wolfe algorithm (FW) for recovering sparse solutions to posterior inference. From detailed elucidation of both theoretical and practical aspects, FW exhibits many interesting properties which are beneficial to topic modeling. We then employ FW to design fast methods, including ML-FW, for learning latent Dirichlet allocation (LDA) at large scales. Extensive experiments show that to reach the same predictiveness level, ML-FW can perform tens to thousand times faster than existing state-of-the-art methods for learning LDA from massive/streaming data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/10/2015

Guaranteed inference in topic models

One of the core problems in statistical models is the estimation of a po...
research
09/08/2019

Evaluating Topic Quality with Posterior Variability

Probabilistic topic models such as latent Dirichlet allocation (LDA) are...
research
06/26/2015

An Empirical Study of Stochastic Variational Algorithms for the Beta Bernoulli Process

Stochastic variational inference (SVI) is emerging as the most promising...
research
04/07/2016

Combinatorial Topic Models using Small-Variance Asymptotics

Topic models have emerged as fundamental tools in unsupervised machine l...
research
06/11/2015

Sparse Partially Collapsed MCMC for Parallel Inference in Topic Models

Topic models, and more specifically the class of Latent Dirichlet Alloca...
research
01/16/2015

Bayesian Nonparametrics in Topic Modeling: A Brief Tutorial

Using nonparametric methods has been increasingly explored in Bayesian h...
research
03/11/2008

Component models for large networks

Being among the easiest ways to find meaningful structure from discrete ...

Please sign up or login with your details

Forgot password? Click here to reset