Similarity Learning for High-Dimensional Sparse Data

11/10/2014
by   Kuan Liu, et al.
0

A good measure of similarity between data points is crucial to many tasks in machine learning. Similarity and metric learning methods learn such measures automatically from data, but they do not scale well respect to the dimensionality of the data. In this paper, we propose a method that can learn efficiently similarity measure from high-dimensional sparse data. The core idea is to parameterize the similarity measure as a convex combination of rank-one matrices with specific sparsity structures. The parameters are then optimized with an approximate Frank-Wolfe procedure to maximally satisfy relative similarity constraints on the training data. Our algorithm greedily incorporates one pair of features at a time into the similarity measure, providing an efficient way to control the number of active features and thus reduce overfitting. It enjoys very appealing convergence guarantees and its time and memory complexity depends on the sparsity of the data instead of the dimension of the feature space. Our experiments on real-world high-dimensional datasets demonstrate its potential for classification, dimensionality reduction and data exploration.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/20/2018

Escaping the Curse of Dimensionality in Similarity Learning: Efficient Frank-Wolfe Algorithm and Generalization Bounds

Similarity and metric learning provides a principled approach to constru...
research
04/15/2021

Sparse online relative similarity learning

For many data mining and machine learning tasks, the quality of a simila...
research
04/30/2021

Ranking the information content of distance measures

Real-world data typically contain a large number of features that are of...
research
11/19/2015

Fast Metric Learning For Deep Neural Networks

Similarity metrics are a core component of many information retrieval an...
research
05/05/2015

Large-scale Classification of Fine-Art Paintings: Learning The Right Metric on The Right Feature

In the past few years, the number of fine-art collections that are digit...
research
11/26/2018

Multiscale geometric feature extraction for high-dimensional and non-Euclidean data with application

A method for extracting multiscale geometric features from a data cloud ...
research
03/30/2021

Structured Inverted-File k-Means Clustering for High-Dimensional Sparse Data

This paper presents an architecture-friendly k-means clustering algorith...

Please sign up or login with your details

Forgot password? Click here to reset