Two to Five Truths in Non-Negative Matrix Factorization

by   John M. Conroy, et al.

In this paper, we explore the role of matrix scaling on a matrix of counts when building a topic model using non-negative matrix factorization. We present a scaling inspired by the normalized Laplacian (NL) for graphs that can greatly improve the quality of a non-negative matrix factorization. The results parallel those in the spectral graph clustering work of <cit.>, where the authors proved adjacency spectral embedding (ASE) spectral clustering was more likely to discover core-periphery partitions and Laplacian Spectral Embedding (LSE) was more likely to discover affinity partitions. In text analysis non-negative matrix factorization (NMF) is typically used on a matrix of co-occurrence “contexts” and “terms" counts. The matrix scaling inspired by LSE gives significant improvement for text topic models in a variety of datasets. We illustrate the dramatic difference a matrix scalings in NMF can greatly improve the quality of a topic model on three datasets where human annotation is available. Using the adjusted Rand index (ARI), a measure cluster similarity we see an increase of 50% for Twitter data and over 200% for a newsgroup dataset versus using counts, which is the analogue of ASE. For clean data, such as those from the Document Understanding Conference, NL gives over 40% improvement over ASE. We conclude with some analysis of this phenomenon and some connections of this scaling with other matrix scaling methods.


page 2

page 3


Identifying Population Movements with Non-Negative Matrix Factorization from Wi-Fi User Counts in Smart and Connected Cities

Non-Negative Matrix Factorization (NMF) is a valuable matrix factorizati...

A Non-Negative Factorization approach to node pooling in Graph Convolutional Neural Networks

The paper discusses a pooling mechanism to induce subsampling in graph s...

Effective Feature Extraction for Intrusion Detection System using Non-negative Matrix Factorization and Univariate analysis

An Intrusion detection system (IDS) is essential for avoiding malicious ...

Orthogonal symmetric non-negative matrix factorization under the stochastic block model

We present a method based on the orthogonal symmetric non-negative matri...

Improving Image Clustering using Sparse Text and the Wisdom of the Crowds

We propose a method to improve image clustering using sparse text and th...

Model selection for robust learning of mutational signatures using Negative Binomial non-negative matrix factorization

The spectrum of mutations in a collection of cancer genomes can be descr...

Please sign up or login with your details

Forgot password? Click here to reset