Concentration Inequalities for Two-Sample Rank Processes with Application to Bipartite Ranking

04/07/2021
by   Stephan Clémençon, et al.
0

The ROC curve is the gold standard for measuring the performance of a test/scoring statistic regarding its capacity to discriminate between two statistical populations in a wide variety of applications, ranging from anomaly detection in signal processing to information retrieval, through medical diagnosis. Most practical performance measures used in scoring/ranking applications such as the AUC, the local AUC, the p-norm push, the DCG and others, can be viewed as summaries of the ROC curve. In this paper, the fact that most of these empirical criteria can be expressed as two-sample linear rank statistics is highlighted and concentration inequalities for collections of such random variables, referred to as two-sample rank processes here, are proved, when indexed by VC classes of scoring functions. Based on these nonasymptotic bounds, the generalization capacity of empirical maximizers of a wide class of ranking performance criteria is next investigated from a theoretical perspective. It is also supported by empirical evidence through convincing numerical experiments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/19/2020

Learning Fair Scoring Functions: Fairness Definitions, Algorithms and Generalization Bounds for Bipartite Ranking

Many applications of artificial intelligence, ranging from credit lendin...
research
02/07/2023

A Bipartite Ranking Approach to the Two-Sample Problem

The two-sample problem, which consists in testing whether independent sa...
research
09/20/2021

Learning to Rank Anomalies: Scalar Performance Criteria and Maximization of Two-Sample Rank Statistics

The ability to collect and store ever more massive databases has been ac...
research
02/05/2015

On Anomaly Ranking and Excess-Mass Curves

Learning how to rank multivariate unlabeled observations depending on th...
research
05/03/2017

Mass Volume Curves and Anomaly Ranking

This paper aims at formulating the issue of ranking multivariate unlabel...
research
11/16/2015

Efficient AUC Optimization for Information Ranking Applications

Adequate evaluation of an information retrieval system to estimate futur...
research
11/09/2015

PAC-Bayesian High Dimensional Bipartite Ranking

This paper is devoted to the bipartite ranking problem, a classical stat...

Please sign up or login with your details

Forgot password? Click here to reset