Efficient Convex Algorithms for Universal Kernel Learning

by   Aleksandr Talitckii, et al.

The accuracy and complexity of machine learning algorithms based on kernel optimization are determined by the set of kernels over which they are able to optimize. An ideal set of kernels should: admit a linear parameterization (for tractability); be dense in the set of all kernels (for robustness); be universal (for accuracy). Recently, a framework was proposed for using positive matrices to parameterize a class of positive semi-separable kernels. Although this class can be shown to meet all three criteria, previous algorithms for optimization of such kernels were limited to classification and furthermore relied on computationally complex Semidefinite Programming (SDP) algorithms. In this paper, we pose the problem of learning semiseparable kernels as a minimax optimization problem and propose a SVD-QCQP primal-dual algorithm which dramatically reduces the computational complexity as compared with previous SDP-based approaches. Furthermore, we provide an efficient implementation of this algorithm for both classification and regression – an implementation which enables us to solve problems with 100 features and up to 30,000 datums. Finally, when applied to benchmark data, the algorithm demonstrates the potential for significant improvement in accuracy over typical (but non-convex) approaches such as Neural Nets and Random Forest with similar or better computation time.


page 1

page 2

page 3

page 4


A New Algorithm for Tessellated Kernel Learning

The accuracy and complexity of machine learning algorithms based on kern...

A Convex Parametrization of a New Class of Universal Kernel Functions for use in Kernel Learning

We propose a new class of universal kernel functions which admit a linea...

Conditional mean embeddings and optimal feature selection via positive definite kernels

Motivated by applications, we consider here new operator theoretic appro...

Learning the kernel matrix via predictive low-rank approximations

Efficient and accurate low-rank approximations of multiple data sources ...

The Random Forest Kernel and other kernels for big data from random partitions

We present Random Partition Kernels, a new class of kernels derived by d...

Spectral Non-Convex Optimization for Dimension Reduction with Hilbert-Schmidt Independence Criterion

The Hilbert Schmidt Independence Criterion (HSIC) is a kernel dependence...

MKL-RT: Multiple Kernel Learning for Ratio-trace Problems via Convex Optimization

In the recent past, automatic selection or combination of kernels (or fe...

Please sign up or login with your details

Forgot password? Click here to reset