Label-Descriptive Patterns and their Application to Characterizing Classification Errors

10/18/2021
by   Michael Hedderich, et al.
0

State-of-the-art deep learning methods achieve human-like performance on many tasks, but make errors nevertheless. Characterizing these errors in easily interpretable terms gives insight into whether a model is prone to making systematic errors, but also gives a way to act and improve the model. In this paper we propose a method that allows us to do so for arbitrary classifiers by mining a small set of patterns that together succinctly describe the input data that is partitioned according to correctness of prediction. We show this is an instance of the more general label description problem, which we formulate in terms of the Minimum Description Length principle. To discover good pattern sets we propose the efficient and hyperparameter-free Premise algorithm, which through an extensive set of experiments we show on both synthetic and real-world data performs very well in practice; unlike existing solutions it ably recovers ground truth patterns, even on highly imbalanced data over many unique items, or where patterns are only weakly associated to labels. Through two real-world case studies we confirm that Premise gives clear and actionable insight into the systematic errors made by modern NLP classifiers.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/22/2015

Keeping it Short and Simple: Summarising Complex Event Sequences with Multivariate Patterns

We study how to obtain concise descriptions of discrete multivariate seq...
research
10/18/2019

Towards Interpretable Graph Modeling with Vertex Replacement Grammars

An enormous amount of real-world data exists in the form of graphs. Ofte...
research
02/16/2016

A Subsequence Interleaving Model for Sequential Pattern Mining

Recent sequential pattern mining methods have used the minimum descripti...
research
05/29/2021

Graph Similarity Description: How Are These Graphs Similar?

How do social networks differ across platforms? How do information netwo...
research
10/02/2020

Deep Learning for Earth Image Segmentation based on Imperfect Polyline Labels with Annotation Errors

In recent years, deep learning techniques (e.g., U-Net, DeepLab) have ac...
research
08/24/2020

Statistically Significant Pattern Mining with Ordinal Utility

Statistically significant patterns mining (SSPM) is an essential and cha...
research
07/17/2022

Repairing Systematic Outliers by Learning Clean Subspaces in VAEs

Data cleaning often comprises outlier detection and data repair. Systema...

Please sign up or login with your details

Forgot password? Click here to reset