A Topological Approach to Measuring Training Data Quality

06/04/2023
by   Álvaro Torras-Casas, et al.
0

Data quality is crucial for the successful training, generalization and performance of artificial intelligence models. Furthermore, it is known that the leading approaches in artificial intelligence are notoriously data-hungry. In this paper, we propose the use of small training datasets towards faster training. Specifically, we provide a novel topological method based on morphisms between persistence modules to measure the training data quality with respect to the complete dataset. This way, we can provide an explanation of why the chosen training dataset will lead to poor performance.

READ FULL TEXT

page 10

page 11

research
01/19/2013

Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence (2002)

This is the Proceedings of the Eighteenth Conference on Uncertainty in A...
research
09/26/2022

Improving Document Image Understanding with Reinforcement Finetuning

Successful Artificial Intelligence systems often require numerous labele...
research
10/22/2018

Towards a context-dependent numerical data quality evaluation framework

This paper focuses on numeric data, with emphasis on distinct characteri...
research
04/02/2023

Optimizing Data Shapley Interaction Calculation from O(2^n) to O(t n^2) for KNN models

With the rapid growth of data availability and usage, quantifying the ad...
research
11/19/2019

Measurement and analysis of visitors' trajectories in crowded museums

We tackle the issue of measuring and analyzing the visitors' dynamics in...
research
07/29/2022

The Effects of Data Quality on Machine Learning Performance

Modern artificial intelligence (AI) applications require large quantitie...
research
11/28/2018

Unrepresentative video data: A review and evaluation

It is well known that the quality and quantity of training data are sign...

Please sign up or login with your details

Forgot password? Click here to reset