How is a data-driven approach better than random choice in label space division for multi-label classification?

06/07/2016
by   Piotr Szymański, et al.
0

We propose using five data-driven community detection approaches from social networks to partition the label space for the task of multi-label classification as an alternative to random partitioning into equal subsets as performed by RAkELd: modularity-maximizing fastgreedy and leading eigenvector, infomap, walktrap and label propagation algorithms. We construct a label co-occurence graph (both weighted an unweighted versions) based on training data and perform community detection to partition the label set. We include Binary Relevance and Label Powerset classification methods for comparison. We use gini-index based Decision Trees as the base classifier. We compare educated approaches to label space divisions against random baselines on 12 benchmark data sets over five evaluation measures. We show that in almost all cases seven educated guess approaches are more likely to outperform RAkELd than otherwise in all measures, but Hamming Loss. We show that fastgreedy and walktrap community detection methods on weighted label co-occurence graphs are 85-92 more likely to yield better F1 scores than random partitioning. Infomap on the unweighted label co-occurence graphs is on average 90 random paritioning in terms of Subset Accuracy and 89 similarity. Weighted fastgreedy is better on average than RAkELd when it comes to Hamming Loss.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/13/2017

Is a Data-Driven Approach still Better than Random Choice with Naive Bayes classifiers?

We study the performance of data-driven, a priori and random approaches ...
research
02/05/2017

A scikit-based Python environment for performing multi-label classification

Scikit-multilearn is a Python library for performing multi-label classif...
research
04/27/2017

A Network Perspective on Stratification of Multi-Label Data

In the recent years, we have witnessed the development of multi-label cl...
research
02/15/2017

Nearest Labelset Using Double Distances for Multi-label Classification

Multi-label classification is a type of supervised learning where an ins...
research
11/16/2020

Multi-label classification: do Hamming loss and subset accuracy really conflict with each other?

Various evaluation measures have been developed for multi-label classifi...
research
07/12/2022

Edge Augmentation on Disconnected Graphs via Eigenvalue Elevation

The graph-theoretical task of determining most likely inter-community ed...
research
08/02/2021

Data-driven Clustering in Ad-hoc Networks based on Community Detection

High demands for industrial networks lead to increasingly large sensor n...

Please sign up or login with your details

Forgot password? Click here to reset