Metric Learning in Multilingual Sentence Similarity Measurement for Document Alignment

08/21/2021
by   Charith Rajitha, et al.
0

Document alignment techniques based on multilingual sentence representations have recently shown state of the art results. However, these techniques rely on unsupervised distance measurement techniques, which cannot be fined-tuned to the task at hand. In this paper, instead of these unsupervised distance measurement techniques, we employ Metric Learning to derive task-specific distance measurements. These measurements are supervised, meaning that the distance measurement metric is trained using a parallel dataset. Using a dataset belonging to English, Sinhala, and Tamil, which belong to three different language families, we show that these task-specific supervised distance learning metrics outperform their unsupervised counterparts, for document alignment.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/15/2021

Bilingual alignment transfers to multilingual alignment for unsupervised parallel text mining

This work presents methods for learning cross-lingual sentence represent...
research
06/12/2021

Exploiting Parallel Corpora to Improve Multilingual Embedding based Document and Sentence Alignment

Multilingual sentence representations pose a great advantage for low-res...
research
01/27/2021

Supervised Tree-Wasserstein Distance

To measure the similarity of documents, the Wasserstein distance is a po...
research
01/31/2020

Massively Multilingual Document Alignment with Cross-lingual Sentence-Mover's Distance

Cross-lingual document alignment aims to identify pairs of documents in ...
research
04/29/2020

A Supervised Word Alignment Method based on Cross-Language Span Prediction using Multilingual BERT

We present a novel supervised word alignment method based on cross-langu...
research
05/03/2017

Lifelong Metric Learning

The state-of-the-art online learning approaches are only capable of lear...
research
08/30/2018

Discriminative Learning of Similarity and Group Equivariant Representations

One of the most fundamental problems in machine learning is to compare e...

Please sign up or login with your details

Forgot password? Click here to reset