Topic Modeling on Podcast Short-Text Metadata

01/12/2022
by   Francisco B. Valero, et al.
0

Podcasts have emerged as a massively consumed online content, notably due to wider accessibility of production means and scaled distribution through large streaming platforms. Categorization systems and information access technologies typically use topics as the primary way to organize or navigate podcast collections. However, annotating podcasts with topics is still quite problematic because the assigned editorial genres are broad, heterogeneous or misleading, or because of data challenges (e.g. short metadata text, noisy transcripts). Here, we assess the feasibility to discover relevant topics from podcast metadata, titles and descriptions, using topic modeling techniques for short text. We also propose a new strategy to leverage named entities (NEs), often present in podcast metadata, in a Non-negative Matrix Factorization (NMF) topic modeling framework. Our experiments on two existing datasets from Spotify and iTunes and Deezer, a new dataset from an online service providing a catalog of podcasts, show that our proposed document representation, NEiCE, leads to improved topic coherence over the baselines. We release the code for experimental reproducibility of the results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/01/2021

Topic Analysis of Superconductivity Literature by Semantic Non-negative Matrix Factorization

We utilize a recently developed topic modeling method called SeNMFk, ext...
research
05/26/2022

Federated Non-negative Matrix Factorization for Short Texts Topic Modeling with Mutual Information

Non-negative matrix factorization (NMF) based topic modeling is widely u...
research
10/09/2020

Paying down metadata debt: learning the representation of concepts using topic models

We introduce a data management problem called metadata debt, to identify...
research
08/21/2022

SeNMFk-SPLIT: Large Corpora Topic Modeling by Semantic Non-negative Matrix Factorization with Automatic Model Selection

As the amount of text data continues to grow, topic modeling is serving ...
research
05/01/2020

Minimally Supervised Categorization of Text with Metadata

Document categorization, which aims to assign a topic label to each docu...
research
04/06/2021

Exploring Topic-Metadata Relationships with the STM: A Bayesian Approach

Topic models such as the Structural Topic Model (STM) estimate latent to...
research
03/24/2021

Improving Editorial Workflow and Metadata Quality at Springer Nature

Identifying the research topics that best describe the scope of a scient...

Please sign up or login with your details

Forgot password? Click here to reset