Encoding Prior Knowledge with Eigenword Embeddings

09/03/2015
by   Dominique Osborne, et al.
0

Canonical correlation analysis (CCA) is a method for reducing the dimension of data represented using two views. It has been previously used to derive word embeddings, where one view indicates a word, and the other view indicates its context. We describe a way to incorporate prior knowledge into CCA, give a theoretical justification for it, and test it by deriving word embeddings and evaluating them on a myriad of datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/25/2021

Spanish Biomedical and Clinical Language Embeddings

We computed both Word and Sub-word Embeddings using FastText. For Sub-wo...
research
05/11/2018

Domain Adapted Word Embeddings for Improved Sentiment Classification

Generic word embeddings are trained on large-scale generic corpora; Doma...
research
09/05/2017

Using k-way Co-occurrences for Learning Word Embeddings

Co-occurrences between two words provide useful insights into the semant...
research
09/06/2018

An Analysis of Hierarchical Text Classification Using Word Embeddings

Efficient distributed numerical word representation models (word embeddi...
research
03/02/2021

Factoring out prior knowledge from low-dimensional embeddings

Low-dimensional embedding techniques such as tSNE and UMAP allow visuali...
research
12/05/2017

AWE-CM Vectors: Augmenting Word Embeddings with a Clinical Metathesaurus

In recent years, word embeddings have been surprisingly effective at cap...
research
01/28/2019

Analogies Explained: Towards Understanding Word Embeddings

Word embeddings generated by neural network methods such as word2vec (W2...

Please sign up or login with your details

Forgot password? Click here to reset