Multi-source Neural Topic Modeling in Multi-view Embedding Spaces

04/17/2021
by   Pankaj Gupta, et al.
0

Though word embeddings and topics are complementary representations, several past works have only used pretrained word embeddings in (neural) topic modeling to address data sparsity in short-text or small collection of documents. This work presents a novel neural topic modeling framework using multi-view embedding spaces: (1) pretrained topic-embeddings, and (2) pretrained word-embeddings (context insensitive from Glove and context-sensitive from BERT models) jointly from one or many sources to improve topic quality and better deal with polysemy. In doing so, we first build respective pools of pretrained topic (i.e., TopicPool) and word embeddings (i.e., WordPool). We then identify one or more relevant source domain(s) and transfer knowledge to guide meaningful learning in the sparse target domain. Within neural topic modeling, we quantify the quality of topics and document representations via generalization (perplexity), interpretability (topic coherence) and information retrieval (IR) using short-text, long-text, small and large document collections from news and medical domains. Introducing the multi-source multi-view embedding spaces, we have shown state-of-the-art neural topic modeling using 6 source (high-resource) and 5 target (low-resource) corpora.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/14/2019

Multi-view and Multi-source Transfers in Neural Topic Modeling with Pretrained Topic and Word Embeddings

Though word embeddings and topics are complementary representations, sev...
research
09/14/2019

Multi-view and Multi-source Transfers in Neural Topic Modeling

Though word embeddings and topics are complementary representations, sev...
research
08/09/2017

Identifying Reference Spans: Topic Modeling and Word Embeddings help IR

The CL-SciSumm 2016 shared task introduced an interesting problem: given...
research
06/19/2020

Neural Topic Modeling with Continual Lifelong Learning

Lifelong learning has recently attracted attention in building machine l...
research
06/05/2019

Topic Sensitive Attention on Generic Corpora Corrects Sense Bias in Pretrained Embeddings

Given a small corpus D_T pertaining to a limited set of focused topics,...
research
10/05/2020

Improving Neural Topic Models using Knowledge Distillation

Topic models are often used to identify human-interpretable topics to he...
research
05/04/2021

Unsupervised Graph-based Topic Modeling from Video Transcriptions

To unfold the tremendous amount of audiovisual data uploaded daily to so...

Please sign up or login with your details

Forgot password? Click here to reset