DeepAI AI Chat
Log In Sign Up

Distributed Adaptive Sampling for Kernel Matrix Approximation

by   Daniele Calandriello, et al.

Most kernel-based methods, such as kernel or Gaussian process regression, kernel PCA, ICA, or k-means clustering, do not scale to large datasets, because constructing and storing the kernel matrix K_n requires at least O(n^2) time and space for n samples. Recent works show that sampling points with replacement according to their ridge leverage scores (RLS) generates small dictionaries of relevant points with strong spectral approximation guarantees for K_n. The drawback of RLS-based methods is that computing exact RLS requires constructing and storing the whole kernel matrix. In this paper, we introduce SQUEAK, a new algorithm for kernel approximation based on RLS sampling that sequentially processes the dataset, storing a dictionary which creates accurate kernel matrix approximations with a number of points that only depends on the effective dimension d_eff(γ) of the dataset. Moreover since all the RLS estimations are efficiently performed using only the small dictionary, SQUEAK is the first RLS sampling algorithm that never constructs the whole matrix K_n, runs in linear time O(nd_eff(γ)^3) w.r.t. n, and requires only a single pass over the dataset. We also propose a parallel and distributed version of SQUEAK that linearly scales across multiple machines, achieving similar accuracy in as little as O((n)d_eff(γ)^3) time.


page 1

page 2

page 3

page 4


Multiresolution Kernel Approximation for Gaussian Process Regression

Gaussian process regression generally does not scale to beyond a few tho...

Exact Sampling of Determinantal Point Processes without Eigendecomposition

Determinantal point processes (DPPs) enable the modelling of repulsion: ...

Recursive Sampling for the Nyström Method

We give the first algorithm for kernel Nyström approximation that runs i...

Near Input Sparsity Time Kernel Embeddings via Adaptive Sampling

To accelerate kernel methods, we propose a near input sparsity time algo...

Fast Randomized Kernel Methods With Statistical Guarantees

One approach to improving the running time of kernel-based machine learn...

Why Size Matters: Feature Coding as Nystrom Sampling

Recently, the computer vision and machine learning community has been in...

Efficient Dataset Distillation Using Random Feature Approximation

Dataset distillation compresses large datasets into smaller synthetic co...