On Tree-based Methods for Similarity Learning

06/21/2019
by   Stephan Clémençon, et al.
0

In many situations, the choice of an adequate similarity measure or metric on the feature space dramatically determines the performance of machine learning methods. Building automatically such measures is the specific purpose of metric/similarity learning. In Vogel et al. (2018), similarity learning is formulated as a pairwise bipartite ranking problem: ideally, the larger the probability that two observations in the feature space belong to the same class (or share the same label), the higher the similarity measure between them. From this perspective, the ROC curve is an appropriate performance criterion and it is the goal of this article to extend recursive tree-based ROC optimization techniques in order to propose efficient similarity learning algorithms. The validity of such iterative partitioning procedures in the pairwise setting is established by means of results pertaining to the theory of U-processes and from a practical angle, it is discussed at length how to implement them by means of splitting rules specifically tailored to the similarity learning task. Beyond these theoretical/methodological contributions, numerical experiments are displayed and provide strong empirical evidence of the performance of the algorithmic approaches we propose.

READ FULL TEXT
research
07/18/2018

A Probabilistic Theory of Supervised Similarity Learning for Pointwise ROC Curve Optimization

The performance of many machine learning techniques depends on the choic...
research
01/17/2018

Ranking Data with Continuous Labels through Oriented Recursive Partitions

We formulate a supervised learning problem, referred to as continuous ra...
research
08/05/2020

Fuzzy Jaccard Index: A robust comparison of ordered lists

We propose Fuzzy Jaccard Index (FUJI) – a scale-invariant score for asse...
research
11/01/2022

On Medians of (Randomized) Pairwise Means

Tournament procedures, recently introduced in Lugosi Mendelson (2016...
research
06/27/2012

Similarity Learning for Provably Accurate Sparse Linear Classification

In recent years, the crucial importance of metrics in machine learning a...
research
06/29/2012

A Hybrid Method for Distance Metric Learning

We consider the problem of learning a measure of distance among vectors ...
research
02/17/2022

Occupation similarity through bipartite graphs

Similarity between occupations is a crucial piece of information when ma...

Please sign up or login with your details

Forgot password? Click here to reset