Nonparametric Relational Topic Models through Dependent Gamma Processes

03/30/2015
by   Junyu Xuan, et al.
0

Traditional Relational Topic Models provide a way to discover the hidden topics from a document network. Many theoretical and practical tasks, such as dimensional reduction, document clustering, link prediction, benefit from this revealed knowledge. However, existing relational topic models are based on an assumption that the number of hidden topics is known in advance, and this is impractical in many real-world applications. Therefore, in order to relax this assumption, we propose a nonparametric relational topic model in this paper. Instead of using fixed-dimensional probability distributions in its generative model, we use stochastic processes. Specifically, a gamma process is assigned to each document, which represents the topic interest of this document. Although this method provides an elegant solution, it brings additional challenges when mathematically modeling the inherent network structure of typical document network, i.e., two spatially closer documents tend to have more similar topics. Furthermore, we require that the topics are shared by all the documents. In order to resolve these challenges, we use a subsampling strategy to assign each document a different gamma process from the global gamma process, and the subsampling probabilities of documents are assigned with a Markov Random Field constraint that inherits the document network structure. Through the designed posterior inference algorithm, we can discover the hidden topics and its number simultaneously. Experimental results on both synthetic and real-world network datasets demonstrate the capabilities of learning the hidden topics and, more importantly, the number of topics.

READ FULL TEXT

page 8

page 13

research
03/30/2015

Infinite Author Topic Model based on Mixed Gamma-Negative Binomial Process

Incorporating the side information of text corpus, i.e., authors, time s...
research
01/21/2020

Random-walk Based Generative Model for Classifying Document Networks

Document networks are found in various collections of real-world data, s...
research
09/26/2013

Integrating Document Clustering and Topic Modeling

Document clustering and topic modeling are two closely related tasks whi...
research
06/30/2021

Sawtooth Factorial Topic Embeddings Guided Gamma Belief Network

Hierarchical topic models such as the gamma belief network (GBN) have de...
research
02/28/2013

Continuous-time Infinite Dynamic Topic Models

Topic models are probabilistic models for discovering topical themes in ...
research
07/18/2017

Cooperative Hierarchical Dirichlet Processes: Superposition vs. Maximization

The cooperative hierarchical structure is a common and significant data ...
research
01/04/2016

Scalable Models for Computing Hierarchies in Information Networks

Information hierarchies are organizational structures that often used to...

Please sign up or login with your details

Forgot password? Click here to reset