Local nearest neighbour classification with applications to semi-supervised learning

04/03/2017
by   Timothy I. Cannings, et al.
0

We derive a new asymptotic expansion for the global excess risk of a local k-nearest neighbour classifier, where the choice of k may depend upon the test point. This expansion elucidates conditions under which the dominant contribution to the excess risk comes from the locus of points at which each class label is equally likely to occur, but we also show that if these conditions are not satisfied, the dominant contribution may arise from the tails of the marginal distribution of the features. Moreover, we prove that, provided the d-dimensional marginal distribution of the features has a finite ρth moment for some ρ > 4 (as well as other regularity conditions), a local choice of k can yield a rate of convergence of the excess risk of O(n^-4/(d+4)), where n is the sample size, whereas for the standard k-nearest neighbour classifier, our theory would require d ≥ 5 and ρ > 4d/(d-4) finite moments to achieve this rate. Our results motivate a new k-nearest neighbour classifier for semi-supervised learning problems, where the unlabelled data are used to obtain an estimate of the marginal feature density, and fewer neighbours are used for classification when this density estimate is small. The potential improvements over the standard k-nearest neighbour classifier are illustrated both through our theory and via a simulation study.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/15/2017

Semi-Supervised Approaches to Efficient Evaluation of Model Prediction Performance

In many modern machine learning applications, the outcome is expensive o...
research
03/02/2016

Asymptotic behavior of ℓ_p-based Laplacian regularization in semi-supervised learning

Given a weighted graph with N vertices, consider a real-valued regressio...
research
05/29/2019

An adaptive nearest neighbor rule for classification

We introduce a variant of the k-nearest neighbor classifier in which k i...
research
05/28/2019

Semi-Supervised Learning, Causality and the Conditional Cluster Assumption

While the success of semi-supervised learning (SSL) is still not fully u...
research
04/05/2018

Semi-Supervised Classification for oil reservoir

This paper addresses the general problem of accurate identification of o...
research
07/11/2016

Minimum Description Length Principle in Supervised Learning with Application to Lasso

The minimum description length (MDL) principle in supervised learning is...
research
11/28/2011

Adaptive Semisupervised Inference

Semisupervised methods inevitably invoke some assumption that links the ...

Please sign up or login with your details

Forgot password? Click here to reset