Handling Imbalanced Dataset in Multi-label Text Categorization using Bagging and Adaptive Boosting

10/27/2018
by   Genta Indra Winata, et al.
0

Imbalanced dataset is occurred due to uneven distribution of data available in the real world such as disposition of complaints on government offices in Bandung. Consequently, multi-label text categorization algorithms may not produce the best performance because classifiers tend to be weighed down by the majority of the data and ignore the minority. In this paper, Bagging and Adaptive Boosting algorithms are employed to handle the issue and improve the performance of text categorization. The result is evaluated with four evaluation metrics such as hamming loss, subset accuracy, example-based accuracy and micro-averaged f-measure. Bagging.ML-LP with SMO weak classifier is the best performer in terms of subset accuracy and example-based accuracy. Bagging.ML-BR with SMO weak classifier has the best micro-averaged f-measure among all. In other hand, AdaBoost.MH with J48 weak classifier has the lowest hamming loss value. Thus, both algorithms have high potential in boosting the performance of text categorization, but only for certain weak classifiers. However, bagging has more potential than adaptive boosting in increasing the accuracy of minority labels.

READ FULL TEXT
research
10/23/2017

Online Boosting Algorithms for Multi-label Ranking

We consider the multi-label ranking approach to multi-label learning. Bo...
research
06/23/2020

Learning Gradient Boosted Multi-label Classification Rules

In multi-label classification, where the evaluation of predictions is le...
research
08/15/2011

A theory of multiclass boosting

Boosting combines weak classifiers to form highly accurate predictors. A...
research
09/09/2022

Estimating Multi-label Accuracy using Labelset Distributions

A multi-label classifier estimates the binary label state (relevant vs i...
research
02/13/2017

Is a Data-Driven Approach still Better than Random Choice with Naive Bayes classifiers?

We study the performance of data-driven, a priori and random approaches ...
research
10/24/2017

A Correction Method of a Binary Classifier Applied to Multi-label Pairwise Models

In this work, we addressed the issue of applying a stochastic classifier...
research
11/16/2020

Multi-label classification: do Hamming loss and subset accuracy really conflict with each other?

Various evaluation measures have been developed for multi-label classifi...

Please sign up or login with your details

Forgot password? Click here to reset