Concatenated p-mean Word Embeddings as Universal Cross-Lingual Sentence Representations

03/04/2018
by   Andreas Rücklé, et al.
0

Average word embeddings are a common baseline for more sophisticated sentence embedding techniques. An important advantage of average word embeddings is their computational and conceptual simplicity. However, they typically fall short of the performances of more complex models such as InferSent. Here, we generalize the concept of average word embeddings to p-mean word embeddings, which are (almost) as efficiently computable. We show that the concatenation of different types of p-mean word embeddings considerably closes the gap to state-of-the-art methods such as InferSent monolingually and substantially outperforms these more complex techniques cross-lingually. In addition, our proposed method outperforms different recently proposed baselines such as SIF and Sent2Vec by a solid margin, thus constituting a much harder-to-beat monolingual baseline for a wide variety of transfer tasks. Our data and code are publicly available.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/18/2016

A Strong Baseline for Learning Cross-Lingual Word Embeddings from Sentence Alignments

While cross-lingual word embeddings have been studied extensively in rec...
research
11/09/2016

A Comparison of Word Embeddings for English and Cross-Lingual Chinese Word Sense Disambiguation

Word embeddings are now ubiquitous forms of word representation in natur...
research
12/28/2019

Robust Cross-lingual Embeddings from Parallel Sentences

Recent advances in cross-lingual word embeddings have primarily relied o...
research
11/01/2018

Learning Unsupervised Word Mapping by Maximizing Mean Discrepancy

Cross-lingual word embeddings aim to capture common linguistic regularit...
research
03/25/2022

Probabilistic Embeddings with Laplacian Graph Priors

We introduce probabilistic embeddings using Laplacian priors (PELP). The...
research
01/13/2020

Visual Storytelling via Predicting Anchor Word Embeddings in the Stories

We propose a learning model for the task of visual storytelling. The mai...
research
03/03/2016

MGNC-CNN: A Simple Approach to Exploiting Multiple Word Embeddings for Sentence Classification

We introduce a novel, simple convolution neural network (CNN) architectu...

Please sign up or login with your details

Forgot password? Click here to reset