An Optimality Proof for the PairDiff operator for Representing Relations between Words
Representing the semantic relations that exist between two given words (or entities) is an important first step in a wide-range of NLP applications such as analogical reasoning, knowledge base completion and relational information retrieval. A simple, yet surprisingly accurate method for representing a relation between two words is to compute the vector offset () between the corresponding word embeddings. Despite its empirical success, it remains unclear whether is the best operator for obtaining a relational representation from word embeddings. In this paper, we conduct a theoretical analysis of the operator. In particular, we show that for word embeddings where cross-dimensional correlations are zero, is the only bilinear operator that can minimise the ℓ_2 loss between analogous word-pairs. We experimentally show that for word embedding created using a broad range of methods, the cross-dimensional correlations in word embeddings are approximately zero, demonstrating the general applicability of our theoretical result. Moreover, we empirically verify the implications of the proven theoretical result in a series of experiments where we repeatedly discover as the best bilinear operator for representing semantic relations between words in several benchmark datasets.
READ FULL TEXT