A Bayesian Approach for Accurate Classification-Based Aggregates

by   Q. A. Meertens, et al.

In this paper, we study the accuracy of values aggregated over classes predicted by a classification algorithm. The problem is that the resulting aggregates (e.g., sums of a variable) are known to be biased. The bias can be large even for highly accurate classification algorithms, in particular when dealing with class-imbalanced data. To correct this bias, the algorithm's classification error rates have to be estimated. In this estimation, two issues arise when applying existing bias correction methods. First, inaccuracies in estimating classification error rates have to be taken into account. Second, impermissible estimates, such as a negative estimate for a positive value, have to be dismissed. We show that both issues are relevant in applications where the true labels are known only for a small set of data points. We propose a novel bias correction method using Bayesian inference. The novelty of our method is that it imposes constraints on the model parameters. We show that our method solves the problem of biased classification-based aggregates as well as the two issues above, in the general setting of multi-class classification. In the empirical evaluation, using a binary classifier on a real-world dataset of company tax returns, we show that our method outperforms existing methods in terms of mean squared error.


page 1

page 2

page 3

page 4


The Dispersion Bias

Estimation error has plagued quantitative finance since Harry Markowitz ...

Learning Debiased Classifier with Biased Committee

Neural networks are prone to be biased towards spurious correlations bet...

A simple correction for covid-19 testing bias

COVID-19 testing studies have become a standard approach for estimating ...

The SAMME.C2 algorithm for severely imbalanced multi-class classification

Classification predictive modeling involves the accurate assignment of o...

Remember to correct the bias when using deep learning for regression!

When training deep learning models for least-squares regression, we cann...

Estimation of testing bias in covid-19

COVID-19 testing studies have become a standard approach for estimating ...

Bayesian Inference of Globular Cluster Properties Using Distribution Functions

We present a Bayesian inference approach to estimating the cumulative ma...

Please sign up or login with your details

Forgot password? Click here to reset