PKLM: A flexible MCAR test using Classification

09/21/2021
by   Loris Michel, et al.
0

We develop a fully non-parametric, fast, easy-to-use, and powerful test for the missing completely at random (MCAR) assumption on the missingness mechanism of a data set. The test compares distributions of different missing patterns on random projections in the variable space of the data. The distributional differences are measured with the Kullback-Leibler Divergence, using probability Random Forests. We thus refer to it as "Projected Kullback-Leibler MCAR" (PKLM) test. The use of random projections makes it applicable even if very little or no fully observed observations are available or if the number of dimensions is large. An efficient permutation approach guarantees the level for any finite sample size, resolving a major shortcoming of most other available tests. Moreover, the test can be used on both discrete and continuous data. We show empirically on a range of simulated data distributions and real data sets that our test has consistently high power and is able to avoid inflated type I errors. Finally, we provide an R-package with an implementation of our test.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/11/2023

A conformal test of linear models via permutation-augmented regressions

Permutation tests are widely recognized as robust alternatives to tests ...
research
06/07/2021

Proper Scoring Rules for Missing Value Imputation

Given the prevalence of missing data in modern statistical research, a b...
research
05/27/2021

Score test for missing at random or not

Missing data are frequently encountered in various disciplines and can b...
research
07/25/2022

Multi-sample Comparison Using Spatial Signs for Infinite Dimensional Data

We consider an analysis of variance type problem, where the sample obser...
research
06/19/2021

Fasano-Franceschini Test: an Implementation of a 2-Dimensional Kolmogorov-Smirnov test in R

The univariate Kolmogorov-Smirnov (KS) test is a non-parametric statisti...
research
08/30/2020

diproperm: An R Package for the DiProPerm Test

High-dimensional low sample size (HDLSS) data sets emerge frequently in ...
research
08/27/2019

Locally Optimized Random Forests

Standard supervised learning procedures are validated against a test set...

Please sign up or login with your details

Forgot password? Click here to reset