Why Size Matters: Feature Coding as Nystrom Sampling

01/15/2013
by   Oriol Vinyals, et al.
0

Recently, the computer vision and machine learning community has been in favor of feature extraction pipelines that rely on a coding step followed by a linear classifier, due to their overall simplicity, well understood properties of linear classifiers, and their computational efficiency. In this paper we propose a novel view of this pipeline based on kernel methods and Nystrom sampling. In particular, we focus on the coding of a data point with a local representation based on a dictionary with fewer elements than the number of data points, and view it as an approximation to the actual function that would compute pair-wise similarity to all data points (often too many to compute in practice), followed by a Nystrom sampling step to select a subset of all data points. Furthermore, since bounds are known on the approximation power of Nystrom sampling as a function of how many samples (i.e. dictionary size) we consider, we can derive bounds on the approximation of the exact (but expensive to compute) kernel matrix, and use it as a proxy to predict accuracy as a function of the dictionary size, which has been observed to increase but also to saturate as we increase its size. This model may help explaining the positive effect of the codebook size and justifying the need to stack more layers (often referred to as deep learning), as flat models empirically saturate as we add more complexity.

READ FULL TEXT

page 1

page 2

page 3

research
06/18/2012

On the Size of the Online Kernel Sparsification Dictionary

We analyze the size of the dictionary constructed from online kernel spa...
research
10/14/2022

Approximation analysis of CNNs from feature extraction view

Deep learning based on deep neural networks has been very successful in ...
research
08/22/2019

Efficient Task-Specific Data Valuation for Nearest Neighbor Algorithms

Given a data set D containing millions of data points and a data consume...
research
03/27/2018

Distributed Adaptive Sampling for Kernel Matrix Approximation

Most kernel-based methods, such as kernel or Gaussian process regression...
research
07/31/2023

A theory of data variability in Neural Network Bayesian inference

Bayesian inference and kernel methods are well established in machine le...
research
01/09/2016

Supervised multiview learning based on simultaneous learning of multiview intact and single view classifier

Multiview learning problem refers to the problem of learning a classifie...
research
01/19/2021

The Unreasonable Effectiveness of Patches in Deep Convolutional Kernels Methods

A recent line of work showed that various forms of convolutional kernel ...

Please sign up or login with your details

Forgot password? Click here to reset