Benefit of Interpolation in Nearest Neighbor Algorithms

09/25/2019
by   Yue Xing, et al.
0

The over-parameterized models attract much attention in the era of data science and deep learning. It is empirically observed that although these models, e.g. deep neural networks, over-fit the training data, they can still achieve small testing error, and sometimes even outperform traditional algorithms which are designed to avoid over-fitting. The major goal of this work is to sharply quantify the benefit of data interpolation in the context of nearest neighbors (NN) algorithm. Specifically, we consider a class of interpolated weighting schemes and then carefully characterize their asymptotic performances. Our analysis reveals a U-shaped performance curve with respect to the level of data interpolation, and proves that a mild degree of data interpolation strictly improves the prediction accuracy and statistical stability over those of the (un-interpolated) optimal kNN algorithm. This theoretically justifies (predicts) the existence of the second U-shaped curve in the recently discovered double descent phenomenon. Note that our goal in this study is not to promote the use of interpolated-NN method, but to obtain theoretical insights on data interpolation inspired by the aforementioned phenomenon.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/05/2018

Statistical Optimality of Interpolated Nearest Neighbor Algorithms

In the era of deep learning, understanding over-fitting phenomenon becom...
research
02/13/2020

Predictive Power of Nearest Neighbors Algorithm under Random Perturbation

We consider a data corruption scenario in the classical k Nearest Neighb...
research
10/09/2021

Theoretically Principled Deep RL Acceleration via Nearest Neighbor Function Approximation

Recently, deep reinforcement learning (RL) has achieved remarkable empir...
research
11/08/2012

Nearest Neighbor Value Interpolation

This paper presents the nearest neighbor value (NNV) algorithm for high ...
research
10/19/2020

Do Deeper Convolutional Networks Perform Better?

Over-parameterization is a recent topic of much interest in the machine ...
research
08/03/2020

Multiple Descent: Design Your Own Generalization Curve

This paper explores the generalization loss of linear regression in vari...
research
12/11/2020

Beyond Occam's Razor in System Identification: Double-Descent when Modeling Dynamics

System identification aims to build models of dynamical systems from dat...

Please sign up or login with your details

Forgot password? Click here to reset