Encoding of data sets and algorithms

03/02/2023
by   Katarina Doctor, et al.
0

In many high-impact applications, it is important to ensure the quality of output of a machine learning algorithm as well as its reliability in comparison with the complexity of the algorithm used. In this paper, we have initiated a mathematically rigorous theory to decide which models (algorithms applied on data sets) are close to each other in terms of certain metrics, such as performance and the complexity level of the algorithm. This involves creating a grid on the hypothetical spaces of data sets and algorithms so as to identify a finite set of probability distributions from which the data sets are sampled and a finite set of algorithms. A given threshold metric acting on this grid will express the nearness (or statistical distance) from each algorithm and data set of interest to any given application. A technically difficult part of this project is to estimate the so-called metric entropy of a compact subset of functions of infinitely many variables that arise in the definition of these spaces.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/17/2016

A polynomial-time relaxation of the Gromov-Hausdorff distance

The Gromov-Hausdorff distance provides a metric on the set of isometry c...
research
07/31/2019

Evolutionary Dataset Optimisation: learning algorithm quality through evolution

In this paper we propose a new method for learning how algorithms perfor...
research
11/27/2017

Classifier Selection with Permutation Tests

This work presents a content-based recommender system for machine learni...
research
09/18/2021

An Empirical Evaluation of the t-SNE Algorithm for Data Visualization in Structural Engineering

A fundamental task in machine learning involves visualizing high-dimensi...
research
03/13/2019

Effective local compactness and the hyperspace of located sets

We revisit the definition of effective local compactness, and propose an...
research
02/04/2019

Distances between Data Sets Based on Summary Statistics

The concepts of similarity and distance are crucial in data mining. We c...
research
12/17/2018

Computing the Hausdorff Distance of Two Sets from Their Signed Distance Functions

The Hausdorff distance is a measure of (dis-)similarity between two sets...

Please sign up or login with your details

Forgot password? Click here to reset