Impact of Biases in Big Data

03/02/2018
by   Patrick Glauner, et al.
0

The underlying paradigm of big data-driven machine learning reflects the desire of deriving better conclusions from simply analyzing more data, without the necessity of looking at theory and models. Is having simply more data always helpful? In 1936, The Literary Digest collected 2.3M filled in questionnaires to predict the outcome of that year's US presidential election. The outcome of this big data prediction proved to be entirely wrong, whereas George Gallup only needed 3K handpicked people to make an accurate prediction. Generally, biases occur in machine learning whenever the distributions of training set and test set are different. In this work, we provide a review of different sorts of biases in (big) data sets in machine learning. We provide definitions and discussions of the most commonly appearing biases in machine learning: class imbalance and covariate shift. We also show how these biases can be quantified and corrected. This work is an introductory text for both researchers and practitioners to become more aware of this topic and thus to derive more reliable models for their learning problems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/13/2017

Is Big Data Sufficient for a Reliable Detection of Non-Technical Losses?

Non-technical losses (NTL) occur during the distribution of electricity ...
research
01/17/2018

On the Reduction of Biases in Big Data Sets for the Detection of Irregular Power Usage

In machine learning, a bias occurs whenever training sets are not repres...
research
12/02/2014

Semantic HMC for Big Data Analysis

Analyzing Big Data can help corporations to im-prove their efficiency. I...
research
02/03/2020

FAE: A Fairness-Aware Ensemble Framework

Automated decision making based on big data and machine learning (ML) al...
research
06/25/2019

Fast Data: Moving beyond from Big Data's map-reduce

Big Data may not be the solution many are looking for. The latest rise o...
research
07/05/2019

Networkmetrics unraveled: MBDA in Action

We propose networkmetrics, a new data-driven approach for monitoring, tr...
research
09/23/2022

KeypartX: Graph-based Perception (Text) Representation

The availability of big data has opened up big opportunities for individ...

Please sign up or login with your details

Forgot password? Click here to reset