Hellinger Distance Trees for Imbalanced Streams

05/09/2014
by   R. J. Lyon, et al.
0

Classifiers trained on data sets possessing an imbalanced class distribution are known to exhibit poor generalisation performance. This is known as the imbalanced learning problem. The problem becomes particularly acute when we consider incremental classifiers operating on imbalanced data streams, especially when the learning objective is rare class identification. As accuracy may provide a misleading impression of performance on imbalanced data, existing stream classifiers based on accuracy can suffer poor minority class performance on imbalanced streams, with the result being low minority class recall rates. In this paper we address this deficiency by proposing the use of the Hellinger distance measure, as a very fast decision tree split criterion. We demonstrate that by using Hellinger a statistically significant improvement in recall rates on imbalanced data streams can be achieved, with an acceptable increase in the false positive rate.

READ FULL TEXT
research
10/29/2019

Predicting Louisiana Public High School Dropout through Imbalanced Learning Techniques

This study is motivated by the magnitude of the problem of Louisiana hig...
research
10/15/2022

The Influence of Multiple Classes on Learning Online Classifiers from Imbalanced and Concept Drifting Data Streams

This work is aimed at the experimental studying the influence of local d...
research
01/30/2021

Hellinger Distance Weighted Ensemble for Imbalanced Data Stream Classification

The imbalanced data classification remains a vital problem. The key is t...
research
11/15/2020

Precision-Recall Curve (PRC) Classification Trees

The classification of imbalanced data has presented a significant challe...
research
03/01/2017

Multi-stage Neural Networks with Single-sided Classifiers for False Positive Reduction and its Evaluation using Lung X-ray CT Images

Lung nodule classification is a class imbalanced problem because nodules...
research
10/22/2022

Learning Classifiers for Imbalanced and Overlapping Data

This study is about inducing classifiers using data that is imbalanced, ...
research
04/19/2018

Instance Selection Improves Geometric Mean Accuracy: A Study on Imbalanced Data Classification

A natural way of handling imbalanced data is to attempt to equalise the ...

Please sign up or login with your details

Forgot password? Click here to reset