Development of a Neural Network-based Method for Improved Imputation of Missing Values in Time Series Data by Repurposing DataWig

08/18/2023
by   Daniel Zhang, et al.
0

Time series data are observations collected over time intervals. Successful analysis of time series data captures patterns such as trends, cyclicity and irregularity, which are crucial for decision making in research, business, and governance. However, missing values in time series data occur often and present obstacles to successful analysis, thus they need to be filled with alternative values, a process called imputation. Although various approaches have been attempted for robust imputation of time series data, even the most advanced methods still face challenges including limited scalability, poor capacity to handle heterogeneous data types and inflexibility due to requiring strong assumptions of data missing mechanisms. Moreover, the imputation accuracy of these methods still has room for improvement. In this study, I developed tsDataWig (time-series DataWig) by modifying DataWig, a neural network-based method that possesses the capacity to process large datasets and heterogeneous data types but was designed for non-time series data imputation. Unlike the original DataWig, tsDataWig can directly handle values of time variables and impute missing values in complex time series datasets. Using one simulated and three different complex real-world time series datasets, I demonstrated that tsDataWig outperforms the original DataWig and the current state-of-the-art methods for time series data imputation and potentially has broad application due to not requiring strong assumptions of data missing mechanisms. This study provides a valuable solution for robustly imputing missing values in challenging time series datasets, which often contain millions of samples, high dimensional variables, and heterogeneous data types.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/27/2018

BRITS: Bidirectional Recurrent Imputation for Time Series

Time series are widely used as signals in many classification/regression...
research
08/11/2018

A Consistent Method for Learning OOMs from Asymptotically Stationary Time Series Data Containing Missing Values

In the traditional framework of spectral learning of stochastic time ser...
research
10/25/2021

Time series signal recovery methods: comparative study

Signal data often contains missing values. Effective replacement (imputa...
research
01/05/2021

Data-Driven Copy-Paste Imputation for Energy Time Series

A cornerstone of the worldwide transition to smart grids are smart meter...
research
07/03/2018

Recovering gaps in the gamma-ray logging method

The gamma-ray logging method is one of the mandatory well logging method...
research
04/26/2021

tsrobprep - an R package for robust preprocessing of time series data

Data cleaning is a crucial part of every data analysis exercise. Yet, th...
research
09/15/2023

Modelling Irregularly Sampled Time Series Without Imputation

Modelling irregularly-sampled time series (ISTS) is challenging because ...

Please sign up or login with your details

Forgot password? Click here to reset