Minimax rate of consistency for linear models with missing values

02/03/2022
by   Alexis Ayme, et al.
0

Missing values arise in most real-world data sets due to the aggregation of multiple sources and intrinsically missing information (sensor failure, unanswered questions in surveys...). In fact, the very nature of missing values usually prevents us from running standard learning algorithms. In this paper, we focus on the extensively-studied linear models, but in presence of missing values, which turns out to be quite a challenging task. Indeed, the Bayes rule can be decomposed as a sum of predictors corresponding to each missing pattern. This eventually requires to solve a number of learning tasks, exponential in the number of input features, which makes predictions impossible for current real-world datasets. First, we propose a rigorous setting to analyze a least-square type estimator and establish a bound on the excess risk which increases exponentially in the dimension. Consequently, we leverage the missing data distribution to propose a new algorithm, andderive associated adaptive risk bounds that turn out to be minimax optimal. Numerical experiments highlight the benefits of our method compared to state-of-the-art algorithms used for predictions with missing values.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/10/2020

Missing Data Imputation using Optimal Transport

Missing data is a crucial issue when applying machine learning algorithm...
research
05/30/2023

Adapting Fairness Interventions to Missing Values

Missing values in real-world data pose a significant and unique challeng...
research
05/19/2023

Nonparametric classification with missing data

We introduce a new nonparametric framework for classification problems i...
research
09/30/2021

LIFE: Learning Individual Features for Multivariate Time Series Prediction with Missing Values

Multivariate time series (MTS) prediction is ubiquitous in real-world fi...
research
02/21/2020

Debiasing Stochastic Gradient Descent to handle missing values

A major caveat of large scale data is their incom-pleteness. We propose ...
research
07/03/2020

Neumann networks: differential programming for supervised learning with missing values

The presence of missing values makes supervised learning much more chall...
research
07/19/2018

Unrolling Swiss Cheese: Metric repair on manifolds with holes

For many machine learning tasks, the input data lie on a low-dimensional...

Please sign up or login with your details

Forgot password? Click here to reset