Learning Multilingual Topics from Incomparable Corpus

06/11/2018
by   Shudong Hao, et al.
0

Multilingual topic models enable crosslingual tasks by extracting consistent topics from multilingual corpora. Most models require parallel or comparable training corpora, which limits their ability to generalize. In this paper, we first demystify the knowledge transfer mechanism behind multilingual topic models by defining an alternative but equivalent formulation. Based on this analysis, we then relax the assumption of training data required by most existing models, creating a model that only requires a dictionary for training. Experiments show that our new method effectively learns coherent multilingual topics from partially and fully incomparable corpora with limited amounts of dictionary resources.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/09/2012

Multilingual Topic Models for Unaligned Text

We develop the multilingual topic model for unaligned text (MuTo), a pro...
research
10/13/2018

Understanding Crosslingual Transfer Mechanisms in Probabilistic Topic Modeling

Probabilistic topic modeling is a popular choice as the first step of cr...
research
04/26/2018

Lessons from the Bible on Modern Topics: Low-Resource Multilingual Topic Model Evaluation

Multilingual topic models enable document analysis across languages thro...
research
11/28/2019

Legal document retrieval across languages: topic hierarchies based on synsets

Cross-lingual annotations of legislative texts enable us to explore majo...
research
01/28/2022

Towards a Broad Coverage Named Entity Resource: A Data-Efficient Approach for Many Diverse Languages

Parallel corpora are ideal for extracting a multilingual named entity (M...
research
03/22/2016

Multi-domain machine translation enhancements by parallel data extraction from comparable corpora

Parallel texts are a relatively rare language resource, however, they co...
research
05/23/2022

Unsupervised Tokenization Learning

In the presented study, we discover that the so-called "transition freed...

Please sign up or login with your details

Forgot password? Click here to reset