Simple and Effective Dimensionality Reduction for Word Embeddings

08/11/2017
by   Vikas Raunak, et al.
0

Word embeddings have become the basic building blocks for several natural language processing and information retrieval tasks. Pre-trained word embeddings are used in several downstream applications as well as for constructing representations for sentences, paragraphs and documents. Recently, there has been an emphasis on further improving the pre-trained word vectors through post-processing algorithms. One such area of improvement is the dimensionality reduction of the word embeddings. Reducing the size of word embeddings through dimensionality reduction can improve their utility in memory constrained devices, benefiting several real-world applications. In this work, we present a novel algorithm that effectively combines PCA based dimensionality reduction with a recently proposed post-processing algorithm, to construct word embeddings of lower dimensions. Empirical evaluations on 12 standard word similarity benchmarks show that our algorithm reduces the embedding dimensionality by 50 performance than the higher dimension embeddings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/27/2019

An Empirical Study on Post-processing Methods for Word Embeddings

Word embeddings learnt from large corpora have been adopted in various a...
research
04/22/2021

On Geodesic Distances and Contextual Embedding Compression for Text Classification

In some memory-constrained settings like IoT devices and over-the-networ...
research
09/04/2020

Going Beyond T-SNE: Exposing whatlies in Text Embeddings

We introduce whatlies, an open source toolkit for visually inspecting wo...
research
10/25/2020

Autoencoding Improves Pre-trained Word Embeddings

Prior work investigating the geometry of pre-trained word embeddings hav...
research
06/20/2023

Unexplainable Explanations: Towards Interpreting tSNE and UMAP Embeddings

It has become standard to explain neural network latent spaces with attr...
research
10/07/2020

Less is more: Faster and better music version identification with embedding distillation

Version identification systems aim to detect different renditions of the...
research
10/01/2019

Specializing Word Embeddings (for Parsing) by Information Bottleneck

Pre-trained word embeddings like ELMo and BERT contain rich syntactic an...

Please sign up or login with your details

Forgot password? Click here to reset