Using Variational Inference and MapReduce to Scale Topic Modeling

07/19/2011
by   Ke Zhai, et al.
0

Latent Dirichlet Allocation (LDA) is a popular topic modeling technique for exploring document collections. Because of the increasing prevalence of large datasets, there is a need to improve the scalability of inference of LDA. In this paper, we propose a technique called MapReduce LDA (Mr. LDA) to accommodate very large corpus collections in the MapReduce framework. In contrast to other techniques to scale inference for LDA, which use Gibbs sampling, we use variational inference. Our solution efficiently distributes computation and is relatively simple to implement. More importantly, this variational implementation, unlike highly tuned and specialized implementations, is easily extensible. We demonstrate two extensions of the model possible with this scalable framework: informed priors to guide topic discovery and modeling topics from a multilingual corpus.

READ FULL TEXT
research
03/04/2017

Autoencoding Variational Inference For Topic Models

Topic models are one of the most popular methods for learning representa...
research
02/07/2019

Towards Autoencoding Variational Inference for Aspect-based Opinion Summary

Aspect-based Opinion Summary (AOS), consisting of aspect discovery and s...
research
06/13/2019

Topic Modeling via Full Dependence Mixtures

We consider the topic modeling problem for large datasets. For this prob...
research
05/24/2016

Computing Web-scale Topic Models using an Asynchronous Parameter Server

Topic models such as Latent Dirichlet Allocation (LDA) have been widely ...
research
12/10/2015

Scalable Modeling of Conversational-role based Self-presentation Characteristics in Large Online Forums

Online discussion forums are complex webs of overlapping subcommunities ...
research
06/26/2015

An Empirical Study of Stochastic Variational Algorithms for the Beta Bernoulli Process

Stochastic variational inference (SVI) is emerging as the most promising...
research
05/01/2017

Stochastic Divergence Minimization for Biterm Topic Model

As the emergence and the thriving development of social networks, a huge...

Please sign up or login with your details

Forgot password? Click here to reset