False perfection in machine prediction: Detecting and assessing circularity problems in machine learning

06/23/2021
by   Michael Hagmann, et al.
0

Machine learning algorithms train models from patterns of input data and target outputs, with the goal of predicting correct outputs for unseen test inputs. Here we demonstrate a problem of machine learning in vital application areas such as medical informatics or patent law that consists of the inclusion of measurements on which target outputs are deterministically defined in the representations of input data. This leads to perfect, but circular predictions based on a machine reconstruction of the known target definition, but fails on real-world data where the defining measurements may not or only incompletely be available. We present a circularity test that shows, for given datasets and black-box machine learning models, whether the target functional definition can be reconstructed and has been used in training. We argue that a transfer of research results to real-world applications requires to avoid circularity by separating measurements that define target outcomes from data representations in machine learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/28/2020

Modeling Generalization in Machine Learning: A Methodological and Computational Study

As machine learning becomes more and more available to the general publi...
research
04/08/2023

SimbaML: Connecting Mechanistic Models and Machine Learning with Augmented Data

Training sophisticated machine learning (ML) models requires large datas...
research
07/26/2018

High Dimensional Model Representation as a Glass Box in Supervised Machine Learning

Prediction and explanation are key objects in supervised machine learnin...
research
06/18/2019

Declarative Learning-Based Programming as an Interface to AI Systems

Data-driven approaches are becoming more common as problem-solving techn...
research
12/11/2019

Graph Input Representations for Machine Learning Applications in Urban Network Analysis

Understanding and learning the characteristics of network paths has been...
research
04/21/2022

Robustness of Machine Learning Models Beyond Adversarial Attacks

Correctly quantifying the robustness of machine learning models is a cen...
research
01/09/2019

High Fidelity Vector Space Models of Structured Data

Machine learning systems regularly deal with structured data in real-wor...

Please sign up or login with your details

Forgot password? Click here to reset