Revisiting Random Binning Features: Fast Convergence and Strong Parallelizability

09/14/2018
by   Lingfei Wu, et al.
0

Kernel method has been developed as one of the standard approaches for nonlinear learning, which however, does not scale to large data set due to its quadratic complexity in the number of samples. A number of kernel approximation methods have thus been proposed in the recent years, among which the random features method gains much popularity due to its simplicity and direct reduction of nonlinear problem to a linear one. The Random Binning (RB) feature, proposed in the first random-feature paper rahimi2007random, has drawn much less attention than the Random Fourier (RF) feature. In this work, we observe that the RB features, with right choice of optimization solver, could be orders-of-magnitude more efficient than other random features and kernel approximation methods under the same requirement of accuracy. We thus propose the first analysis of RB from the perspective of optimization, which by interpreting RB as a Randomized Block Coordinate Descent in the infinite-dimensional space, gives a faster convergence rate compared to that of other random features. In particular, we show that by drawing R random grids with at least κ number of non-empty bins per grid in expectation, RB method achieves a convergence rate of O(1/(κ R)), which not only sharpens its O(1/√(R)) rate from Monte Carlo analysis, but also shows a κ times speedup over other random features under the same analysis framework. In addition, we demonstrate another advantage of RB in the L1-regularized setting, where unlike other random features, a RB-based Coordinate Descent solver can be parallelized with guaranteed speedup proportional to κ. Our extensive experiments demonstrate the superior performance of the RB features over other random features and kernel approximation methods. Our code and data is available at <https://github.com/teddylfwu/RandomBinning>.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/27/2017

Data Dependent Kernel Approximation using Pseudo Random Fourier Features

Kernel methods are powerful and flexible approach to solve many problems...
research
06/20/2017

Statistical Consistency of Kernel PCA with Random Features

Kernel methods are powerful learning methodologies that provide a simple...
research
02/17/2016

Large Scale Kernel Learning using Block Coordinate Descent

We demonstrate that distributed block coordinate descent can quickly sol...
research
02/17/2021

Fast Graph Learning with Unique Optimal Solutions

Graph Representation Learning (GRL) has been advancing at an unprecedent...
research
10/09/2018

Data-dependent compression of random features for large-scale kernel approximation

Kernel methods offer the flexibility to learn complex relationships in m...
research
06/01/2015

Coordinate Descent Converges Faster with the Gauss-Southwell Rule Than Random Selection

There has been significant recent work on the theory and application of ...
research
11/13/2016

Accelerated Variance Reduced Block Coordinate Descent

Algorithms with fast convergence, small number of data access, and low p...

Please sign up or login with your details

Forgot password? Click here to reset