Why So Down? The Role of Negative (and Positive) Pointwise Mutual Information in Distributional Semantics
In distributional semantics, the pointwise mutual information (PMI) weighting of the cooccurrence matrix performs far better than raw counts. There is, however, an issue with unobserved pair cooccurrences as PMI goes to negative infinity. This problem is aggravated by unreliable statistics from finite corpora which lead to a large number of such pairs. A common practice is to clip negative PMI (- PMI) at 0, also known as Positive PMI (PPMI). In this paper, we investigate alternative ways of dealing with - PMI and, more importantly, study the role that negative information plays in the performance of a low-rank, weighted factorization of different PMI matrices. Using various semantic and syntactic tasks as probes into models which use either negative or positive PMI (or both), we find that most of the encoded semantics and syntax come from positive PMI, in contrast to - PMI which contributes almost exclusively syntactic information. Our findings deepen our understanding of distributional semantics, while also introducing novel PMI variants and grounding the popular PPMI measure.
READ FULL TEXT