Establishing strong imputation performance of a denoising autoencoder in a wide range of missing data problems

04/06/2020
by   Najmeh Abiri, et al.
0

Dealing with missing data in data analysis is inevitable. Although powerful imputation methods that address this problem exist, there is still much room for improvement. In this study, we examined single imputation based on deep autoencoders, motivated by the apparent success of deep learning to efficiently extract useful dataset features. We have developed a consistent framework for both training and imputation. Moreover, we benchmarked the results against state-of-the-art imputation methods on different data sizes and characteristics. The work was not limited to the one-type variable dataset; we also imputed missing data with multi-type variables, e.g., a combination of binary, categorical, and continuous attributes. To evaluate the imputation methods, we randomly corrupted the complete data, with varying degrees of corruption, and then compared the imputed and original values. In all experiments, the developed autoencoder obtained the smallest error for all ranges of initial data corruption.

READ FULL TEXT

page 10

page 12

research
05/08/2017

Multiple Imputation Using Deep Denoising Autoencoders

Missing data is a well-recognized problem impacting all domains. State-o...
research
10/28/2016

Missing Data Imputation for Supervised Learning

This paper compares methods for imputing missing categorical data for su...
research
06/30/2021

DAEMA: Denoising Autoencoder with Mask Attention

Missing data is a recurrent and challenging problem, especially when usi...
research
09/09/2021

Evaluation of imputation techniques with varying percentage of missing data

Missing data is a common problem which has consistently plagued statisti...
research
02/19/2020

Multiple Imputation with Denoising Autoencoder using Metamorphic Truth and Imputation Feedback

Although data may be abundant, complete data is less so, due to missing ...
research
02/21/2023

Spatio-Temporal Denoising Graph Autoencoders with Data Augmentation for Photovoltaic Timeseries Data Imputation

The integration of the global Photovoltaic (PV) market with real time da...
research
05/10/2022

Explainable Data Imputation using Constraints

Data values in a dataset can be missing or anomalous due to mishandling ...

Please sign up or login with your details

Forgot password? Click here to reset