Phase transition in PCA with missing data: Reduced signal-to-noise ratio, not sample size!

05/02/2019
by   Niels Bruun Ipsen, et al.
0

How does missing data affect our ability to learn signal structures? It has been shown that learning signal structure in terms of principal components is dependent on the ratio of sample size and dimensionality and that a critical number of observations is needed before learning starts (Biehl and Mietzner, 1993). Here we generalize this analysis to include missing data. Probabilistic principal component analysis is regularly used for estimating signal structures in datasets with missing data. Our analytic result suggests that the effect of missing data is to effectively reduce signal-to-noise ratio rather than - as generally believed - to reduce sample size. The theory predicts a phase transition in the learning curves and this is indeed found both in simulation data and in real datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/10/2023

Blockwise Principal Component Analysis for monotone missing data imputation and dimensionality reduction

Monotone missing data is a common problem in data analysis. However, imp...
research
05/24/2021

Deconvolution density estimation with penalised MLE

Deconvolution is the important problem of estimating the distribution of...
research
02/21/2023

Density Ratio Estimation and Neyman Pearson Classification with Missing Data

Density Ratio Estimation (DRE) is an important machine learning techniqu...
research
12/17/2021

The Effect of Sample Size and Missingness on Inference with Missing Data

When are inferences (whether Direct-Likelihood, Bayesian, or Frequentist...
research
12/24/2017

Efficient data augmentation techniques for Gaussian state space models

We propose a data augmentation scheme for improving the rate of converge...
research
11/30/2021

HyperPCA: a Powerful Tool to Extract Elemental Maps from Noisy Data Obtained in LIBS Mapping of Materials

Laser-induced breakdown spectroscopy is a preferred technique for fast a...
research
01/31/2019

Phase Transition in the Recovery of Rank One Matrices Corrupted by Gaussian Noise

In datasets where the number of parameters is fixed and the number of sa...

Please sign up or login with your details

Forgot password? Click here to reset