Inductive Document Network Embedding with Topic-Word Attention

01/10/2020
by   Robin Brochier, et al.
0

Document network embedding aims at learning representations for a structured text corpus i.e. when documents are linked to each other. Recent algorithms extend network embedding approaches by incorporating the text content associated with the nodes in their formulations. In most cases, it is hard to interpret the learned representations. Moreover, little importance is given to the generalization to new documents that are not observed within the network. In this paper, we propose an interpretable and inductive document network embedding method. We introduce a novel mechanism, the Topic-Word Attention (TWA), that generates document representations based on the interplay between word and topic representations. We train these word and topic vectors through our general model, Inductive Document Network Embedding (IDNE), by leveraging the connections in the document network. Quantitative evaluations show that our approach achieves state-of-the-art performance on various networks and we qualitatively show that our model produces meaningful and interpretable representations of the words, topics and documents.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/03/2022

Representing Mixtures of Word Embeddings with Mixtures of Topic Embeddings

A topic model is often formulated as a generative model that explains ho...
research
02/28/2019

Global Vectors for Node Representations

Most network embedding algorithms consist in measuring co-occurrences of...
research
05/06/2016

Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec

Distributed dense word vectors have been shown to be effective at captur...
research
08/12/2022

Scholastic: Graphical Human-Al Collaboration for Inductive and Interpretive Text Analysis

Interpretive scholars generate knowledge from text corpora by manually s...
research
12/02/2020

TAN-NTM: Topic Attention Networks for Neural Topic Modeling

Topic models have been widely used to learn representations from text an...
research
04/08/2019

Crosslingual Document Embedding as Reduced-Rank Ridge Regression

There has recently been much interest in extending vector-based word rep...
research
01/06/2020

Semantic Sensitive TF-IDF to Determine Word Relevance in Documents

Keyword extraction has received an increasing attention as an important ...

Please sign up or login with your details

Forgot password? Click here to reset