Classification of Imbalanced Data with a Geometric Digraph Family

04/09/2019
by   Artür Manukyan, et al.
0

We use a geometric digraph family called class cover catch digraphs (CCCDs) to tackle the class imbalance problem in statistical classification. CCCDs provide graph theoretic solutions to the class cover problem and have been employed in classification. We assess the classification performance of CCCD classifiers by extensive Monte Carlo simulations, comparing them with other classifiers commonly used in the literature. In particular, we show that CCCD classifiers perform relatively well when one class is more frequent than the other in a two-class setting, an example of the class imbalance problem. We also point out the relationship between class imbalance and class overlapping problems, and their influence on the performance of CCCD classifiers and other classification methods as well as some state-of-the-art algorithms which are robust to class imbalance by construction. Experiments on both simulated and real data sets indicate that CCCD classifiers are robust to the class imbalance problem. CCCDs substantially undersample from the majority class while preserving the information on the discarded points during the undersampling process. Many state-of-the-art methods, however, keep this information by means of ensemble classifiers, but CCCDs yield only a single classifier with the same property, making it both appealing and fast.

READ FULL TEXT
research
05/22/2017

Classification Using Proximity Catch Digraphs (Technical Report)

We employ random geometric digraphs to construct semi-parametric classif...
research
08/26/2020

Appropriateness of Performance Indices for Imbalanced Data Classification: An Analysis

Indices quantifying the performance of classifiers under class-imbalance...
research
08/31/2016

Towards Competitive Classifiers for Unbalanced Classification Problems: A Study on the Performance Scores

Although a great methodological effort has been invested in proposing co...
research
05/02/2023

Out-of-distribution detection algorithms for robust insect classification

Deep learning-based approaches have produced models with good insect cla...
research
09/08/2019

Self-paced Ensemble for Highly Imbalanced Massive Data Classification

Many real-world applications reveal difficulties in learning classifiers...
research
09/08/2019

Training Effective Ensemble on Imbalanced Data by Self-paced Harmonizing Classification Hardness

Many real-world applications reveal difficulties in learning classifiers...

Please sign up or login with your details

Forgot password? Click here to reset