Bayes Imbalance Impact Index: A Measure of Class Imbalanced Dataset for Classification Problem

01/29/2019
by   Yang Lu, et al.
0

Recent studies have shown that imbalance ratio is not the only cause of the performance loss of a classifier in imbalanced data classification. In fact, other data factors, such as small disjuncts, noises and overlapping, also play the roles in tandem with imbalance ratio, which makes the problem difficult. Thus far, the empirical studies have demonstrated the relationship between the imbalance ratio and other data factors only. To the best of our knowledge, there is no any measurement about the extent of influence of class imbalance on the classification performance of imbalanced data. Further, it is also unknown for a dataset which data factor is actually the main barrier for classification. In this paper, we focus on Bayes optimal classifier and study the influence of class imbalance from a theoretical perspective. Accordingly, we propose an instance measure called Individual Bayes Imbalance Impact Index (IBI^3) and a data measure called Bayes Imbalance Impact Index (BI^3). IBI^3 and BI^3 reflect the extent of influence purely by the factor of imbalance in terms of each minority class sample and the whole dataset, respectively. Therefore, IBI^3 can be used as an instance complexity measure of imbalance and BI^3 is a criterion to show the degree of how imbalance deteriorates the classification. As a result, we can therefore use BI^3 to judge whether it is worth using imbalance recovery methods like sampling or cost-sensitive methods to recover the performance loss of a classifier. The experiments show that IBI^3 is highly consistent with the increase of prediction score made by the imbalance recovery methods and BI^3 is highly consistent with the improvement of F1 score made by the imbalance recovery methods on both synthetic and real benchmark datasets.

READ FULL TEXT

page 4

page 5

page 6

page 8

page 9

page 10

page 11

page 12

research
06/10/2019

CRCEN: A Generalized Cost-sensitive Neural Network Approach for Imbalanced Classification

Classification on imbalanced datasets is a challenging task in real-worl...
research
07/30/2021

Foundations of data imbalance and solutions for a data democracy

Dealing with imbalanced data is a prevalent problem while performing cla...
research
06/23/2020

Classification Performance Metric for Imbalance Data Based on Recall and Selectivity Normalized in Class Labels

In the classification of a class imbalance dataset, the performance meas...
research
07/11/2022

Partial Resampling of Imbalanced Data

Imbalanced data is a frequently encountered problem in machine learning....
research
02/21/2023

Classification with Trust: A Supervised Approach based on Sequential Ellipsoidal Partitioning

Standard metrics of performance of classifiers, such as accuracy and sen...
research
10/05/2021

Tradeoffs in Streaming Binary Classification under Limited Inspection Resources

Institutions are increasingly relying on machine learning models to iden...

Please sign up or login with your details

Forgot password? Click here to reset