Learnability of Learning Performance and Its Application to Data Valuation

07/13/2021
by   Tianhao Wang, et al.
1

For most machine learning (ML) tasks, evaluating learning performance on a given dataset requires intensive computation. On the other hand, the ability to efficiently estimate learning performance may benefit a wide spectrum of applications, such as active learning, data quality management, and data valuation. Recent empirical studies show that for many common ML models, one can accurately learn a parametric model that predicts learning performance for any given input datasets using a small amount of samples. However, the theoretical underpinning of the learnability of such performance prediction models is still missing. In this work, we develop the first theoretical analysis of the ML performance learning problem. We propose a relaxed notion for submodularity that can well describe the behavior of learning performance as a function of input datasets. We give a learning algorithm that achieves a constant-factor approximation under certain assumptions. Further, we give a learning algorithm that achieves arbitrarily small error based on a newly derived structural result. We then discuss a natural, important use case of learning performance learning – data valuation, which is known to suffer computational challenges due to the requirement of estimating learning performance for many data combinations. We show that performance learning can significantly improve the accuracy of data valuation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/22/2021

Interpret-able feedback for AutoML systems

Automated machine learning (AutoML) systems aim to enable training machi...
research
10/09/2019

Who's responsible? Jointly quantifying the contribution of the learning algorithm and training data

A fancy learning algorithm A outperforms a baseline method B when they a...
research
06/10/2021

A Unified Framework for Task-Driven Data Quality Management

High-quality data is critical to train performant Machine Learning (ML) ...
research
01/21/2013

A Linear Time Active Learning Algorithm for Link Classification -- Full Version --

We present very efficient active learning algorithms for link classifica...
research
08/31/2022

Active learning algorithm through the lens of rejection arguments

Active learning is a paradigm of machine learning which aims at reducing...
research
11/25/2020

Large-scale machine learning-based phenotyping significantly improves genomic discovery for optic nerve head morphology

Genome-wide association studies (GWAS) require accurate cohort phenotypi...
research
10/29/2014

High-Performance Distributed ML at Scale through Parameter Server Consistency Models

As Machine Learning (ML) applications increase in data size and model co...

Please sign up or login with your details

Forgot password? Click here to reset