Statistical ranking and combinatorial Hodge theory

11/07/2008
by   Xiaoye Jiang, et al.
0

We propose a number of techniques for obtaining a global ranking from data that may be incomplete and imbalanced -- characteristics almost universal to modern datasets coming from e-commerce and internet applications. We are primarily interested in score or rating-based cardinal data. From raw ranking data, we construct pairwise rankings, represented as edge flows on an appropriate graph. Our statistical ranking method uses the graph Helmholtzian, the graph theoretic analogue of the Helmholtz operator or vector Laplacian, in much the same way the graph Laplacian is an analogue of the Laplace operator or scalar Laplacian. We study the graph Helmholtzian using combinatorial Hodge theory: we show that every edge flow representing pairwise ranking can be resolved into two orthogonal components, a gradient flow that represents the L2-optimal global ranking and a divergence-free flow (cyclic) that measures the validity of the global ranking obtained -- if this is large, then the data does not have a meaningful global ranking. This divergence-free flow can be further decomposed orthogonally into a curl flow (locally cyclic) and a harmonic flow (locally acyclic but globally cyclic); these provides information on whether inconsistency arises locally or globally. An obvious advantage over the NP-hard Kemeny optimization is that discrete Hodge decomposition may be computed via a linear least squares regression. We also investigated the L1-projection of edge flows, showing that this is dual to correlation maximization over bounded divergence-free flows, and the L1-approximate sparse cyclic ranking, showing that this is dual to correlation maximization over bounded curl-free flows. We discuss relations with Kemeny optimization, Borda count, and Kendall-Smith consistency index from social choice theory and statistics.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/04/2023

Representing Edge Flows on Graphs via Sparse Cell Complexes

Obtaining sparse, interpretable representations of observable data is cr...
research
02/20/2019

On Polynomial-Time Combinatorial Algorithms for Maximum L-Bounded Flow

Given a graph G=(V,E) with two distinguished vertices s,t∈ V and an inte...
research
09/08/2009

On Ranking Senators By Their Votes

The problem of ranking a set of objects given some measure of similarity...
research
07/10/2018

Using Complex Network Theory for Temporal Locality in Network Traffic Flows

Monitoring the interaction behaviors of network traffic flows and detect...
research
10/27/2022

Ranking Edges by their Impact on the Spectral Complexity of Information Diffusion over Networks

Despite the numerous ways now available to quantify which parts or subsy...
research
03/19/2020

Faster Divergence Maximization for Faster Maximum Flow

In this paper we provide an algorithm which given any m-edge n-vertex di...
research
01/07/2021

Rankings for Bipartite Tournaments via Chain Editing

Ranking the participants of a tournament has applications in voting, pai...

Please sign up or login with your details

Forgot password? Click here to reset