Automatic Generation of Topic Labels

05/29/2020
by   Areej Alokaili, et al.
0

Topic modelling is a popular unsupervised method for identifying the underlying themes in document collections that has many applications in information retrieval. A topic is usually represented by a list of terms ranked by their probability but, since these can be difficult to interpret, various approaches have been developed to assign descriptive labels to topics. Previous work on the automatic assignment of labels to topics has relied on a two-stage approach: (1) candidate labels are retrieved from a large pool (e.g. Wikipedia article titles); and then (2) re-ranked based on their semantic similarity to the topic terms. However, these extractive approaches can only assign candidate labels from a restricted set that may not include any suitable ones. This paper proposes using a sequence-to-sequence neural-based approach to generate labels that does not suffer from this limitation. The model is trained over a new large synthetic dataset created using distant supervision. The method is evaluated by comparing the labels it generates to ones rated by humans.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/16/2016

Automatic Labelling of Topics with Neural Embeddings

Topics generated by topic models are typically represented as list of te...
research
05/20/2023

Re-visiting Automated Topic Model Evaluation with Large Language Models

Topic models are used to make sense of large text collections. However, ...
research
04/29/2020

Zero-shot topic generation

We present an approach to generating topics using a model trained only f...
research
03/29/2019

Re-Ranking Words to Improve Interpretability of Automatically Generated Topics

Topics models, such as LDA, are widely used in Natural Language Processi...
research
07/11/2000

Two Steps Feature Selection and Neural Network Classification for the TREC-8 Routing

For the TREC-8 routing, one specific filter is built for each topic. Eac...
research
05/30/2014

Semantic Composition and Decomposition: From Recognition to Generation

Semantic composition is the task of understanding the meaning of text by...
research
10/09/2020

Paying down metadata debt: learning the representation of concepts using topic models

We introduce a data management problem called metadata debt, to identify...

Please sign up or login with your details

Forgot password? Click here to reset