A Link between Coding Theory and Cross-Validation with Applications

03/22/2021
by   Tapio Pahikkala, et al.
0

We study the combinatorics of cross-validation based AUC estimation under the null hypothesis that the binary class labels are exchangeable, that is, the data are randomly assigned into two classes given a fixed class proportion. In particular, we study how the estimators based on leave-pair-out cross-validation (LPOCV), in which every possible pair of data with different class labels is held out from the training set at a time, behave under the null without any prior assumptions of the learning algorithm or the data. It is shown that the maximal number of different fixed proportion label assignments on a sample of data, for which a learning algorithm can achieve zero LPOCV error, is the maximal size of a constant weight error correcting code, whose length is the sample size, weight is the number of data labeled with one, and the Hamming distance between code words is four. We then introduce the concept of a light constant weight code and show similar results for nonzero LPOCV errors. We also prove both upper and lower bounds on the maximal sizes of the light constant weight codes that are similar to the classical results for contant weight codes. These results pave the way towards the design of new LPOCV based statistical tests for the learning algorithms ability of distinguishing two classes from each other that are analogous to the classical Wilcoxon-Mann-Whitney U test for fixed functions. Behavior of some representative examples of learning algorithms and data are simulated in an experimental case study.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/24/2020

Cross-validation Confidence Intervals for Test Error

This work develops central limit theorems for cross-validation and consi...
research
10/14/2022

On the size of maximal binary codes with 2, 3, and 4 distances

We address the maximum size of binary codes and binary constant weight c...
research
03/03/2020

Error bounds in estimating the out-of-sample prediction error using leave-one-out cross validation in high-dimensions

We study the problem of out-of-sample risk estimation in the high dimens...
research
05/20/2017

( β, ϖ)-stability for cross-validation and the choice of the number of folds

In this paper, we introduce a new concept of stability for cross-validat...
research
02/20/2019

Cross Validation for Penalized Quantile Regression with a Case-Weight Adjusted Solution Path

Cross validation is widely used for selecting tuning parameters in regul...
research
12/11/2002

Theoretical Analyses of Cross-Validation Error and Voting in Instance-Based Learning

This paper begins with a general theory of error in cross-validation tes...
research
08/23/2022

Integrative conformal p-values for powerful out-of-distribution testing with labeled outliers

This paper develops novel conformal methods to test whether a new observ...

Please sign up or login with your details

Forgot password? Click here to reset