Data Quality Evaluation using Probability Models

09/14/2020
by   Allen ONeill, et al.
0

This paper discusses an approach with machine-learning probability models to evaluate the difference between good and bad data quality in a dataset. A decision tree algorithm is used to predict data quality based on no domain knowledge of the datasets under examination. It is shown that for the data examined, the ability to predict the quality of data based on simple good/bad pre-labelled learning examples is accurate, however in general it may not be sufficient for useful production data quality assessment.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/27/2023

Assessing Dataset Quality Through Decision Tree Characteristics in Autoencoder-Processed Spaces

In this paper, we delve into the critical aspect of dataset quality asse...
research
02/13/2016

Evaluation of Protein Structural Models Using Random Forests

Protein structure prediction has been a grand challenge problem in the s...
research
02/01/2022

Signal Quality Assessment of Photoplethysmogram Signals using Quantum Pattern Recognition and lightweight CNN Architecture

Photoplethysmography (PPG) signal comprises physiological information re...
research
09/25/2017

Towards automation of data quality system for CERN CMS experiment

Daily operation of a large-scale experiment is a challenging task, parti...
research
10/05/2018

ResumeNet: A Learning-based Framework for Automatic Resume Quality Assessment

Recruitment of appropriate people for certain positions is critical for ...
research
07/02/2018

Mining Bad Credit Card Accounts from OLAP and OLTP

Credit card companies classify accounts as a good or bad based on histor...
research
07/01/2021

A Machine Learning Approach to Safer Airplane Landings: Predicting Runway Conditions using Weather and Flight Data

The presence of snow and ice on runway surfaces reduces the available ti...

Please sign up or login with your details

Forgot password? Click here to reset