A computational study on imputation methods for missing environmental data

08/21/2021
by   Paul Dixneuf, et al.
0

Data acquisition and recording in the form of databases are routine operations. The process of collecting data, however, may experience irregularities, resulting in databases with missing data. Missing entries might alter analysis efficiency and, consequently, the associated decision-making process. This paper focuses on databases collecting information related to the natural environment. Given the broad spectrum of recorded activities, these databases typically are of mixed nature. It is therefore relevant to evaluate the performance of missing data processing methods considering this characteristic. In this paper we investigate the performances of several missing data imputation methods and their application to the problem of missing data in environment. A computational study was performed to compare the method missForest (MF) with two other imputation methods, namely Multivariate Imputation by Chained Equations (MICE) and K-Nearest Neighbors (KNN). Tests were made on 10 pretreated datasets of various types. Results revealed that MF generally outperformed MICE and KNN in terms of imputation errors, with a more pronounced performance gap for mixed typed databases where MF reduced the imputation error up to 150 usually the fastest method. MF was then successfully applied to a case study on Quebec wastewater treatment plants performance monitoring. We believe that the present study demonstrates the pertinence of using MF as imputation method when dealing with missing environmental data.

READ FULL TEXT
research
10/05/2022

Dimensional Data KNN-Based Imputation

Data Warehouses (DWs) are core components of Business Intelligence (BI)....
research
01/19/2017

Random Forest Missing Data Algorithms

Random forest (RF) missing data algorithms are an attractive approach fo...
research
06/22/2010

Large gaps imputation in remote sensed imagery of the environment

Imputation of missing data in large regions of satellite imagery is nece...
research
09/09/2021

Evaluation of imputation techniques with varying percentage of missing data

Missing data is a common problem which has consistently plagued statisti...
research
07/06/2020

Multiple Imputation with Massive Data: an Application to the Panel Study of Income Dynamics

Multiple imputation (MI) is a popular and well-established method for ha...
research
05/29/2019

Fairness and Missing Values

The causes underlying unfair decision making are complex, being internal...
research
08/30/2022

An approximate diffusion process for environmental stochasticity in infectious disease transmission modelling

Modelling the transmission dynamics of an infectious disease is a comple...

Please sign up or login with your details

Forgot password? Click here to reset