Radial-Based Undersampling for Imbalanced Data Classification

06/02/2019
by   Michał Koziarski, et al.
0

Data imbalance remains one of the most widespread problems affecting contemporary machine learning. The negative effect data imbalance can have on the traditional learning algorithms is most severe in combination with other dataset difficulty factors, such as small disjuncts, presence of outliers and insufficient number of training observations. Said difficulty factors can also limit the applicability of some of the methods of dealing with data imbalance, in particular the neighborhood-based oversampling algorithms based on SMOTE. Radial-Based Oversampling (RBO) was previously proposed to mitigate some of the limitations of the neighborhood-based methods. In this paper we examine the possibility of utilizing the concept of mutual class potential, used to guide the oversampling process in RBO, in the undersampling procedure. Conducted computational complexity analysis indicates a significantly reduced time complexity of the proposed Radial-Based Undersampling algorithm, and the results of the performed experimental study indicate its usefulness, especially on difficult datasets.

READ FULL TEXT

page 5

page 10

research
04/17/2021

Potential Anchoring for imbalanced data classification

Data imbalance remains one of the factors negatively affecting the perfo...
research
11/28/2021

Imbalanced data preprocessing techniques utilizing local data characteristics

Data imbalance, that is the disproportion between the number of training...
research
04/07/2020

Combined Cleaning and Resampling Algorithm for Multi-Class Imbalanced Data with Label Noise

The imbalanced data classification is one of the most crucial tasks faci...
research
04/07/2020

Two-Stage Resampling for Convolutional Neural Network Training in the Imbalanced Colorectal Cancer Image Classification

Data imbalance remains one of the open challenges in the contemporary ma...
research
08/25/2022

An Empirical Analysis of the Efficacy of Different Sampling Techniques for Imbalanced Classification

Learning from imbalanced data is a challenging task. Standard classifica...
research
04/07/2020

CSMOUTE: Combined Synthetic Oversampling and Undersampling Technique for Imbalanced Data Classification

In this paper we propose two novel data-level algorithms for handling da...
research
05/06/2023

Rethinking Class Imbalance in Machine Learning

Imbalance learning is a subfield of machine learning that focuses on lea...

Please sign up or login with your details

Forgot password? Click here to reset