Classification with many classes: challenges and pluses

06/04/2015
by   Felix Abramovich, et al.
0

The objective of the paper is to study accuracy of multi-class classification in high-dimensional setting, where the number of classes is also large ("large L, large p, small n" model). While this problem arises in many practical applications and many techniques have been recently developed for its solution, to the best of our knowledge nobody provided a rigorous theoretical analysis of this important setup. The purpose of the present paper is to fill in this gap. We consider one of the most common settings, classification of high-dimensional normal vectors where, unlike standard assumptions, the number of classes could be large. We derive non-asymptotic conditions on effects of significant features, and the low and the upper bounds for distances between classes required for successful feature selection and classification with a given accuracy. Furthermore, we study an asymptotic setup where the number of classes is growing with the dimension of feature space and while the number of samples per class is possibly limited. We discover an interesting and, at first glance, somewhat counter-intuitive phenomenon that a large number of classes may be a "blessing" rather than a "curse" since, in certain settings, the precision of classification can improve as the number of classes grows. This is due to more accurate feature selection since even weaker significant features, which are not sufficiently strong to be manifested in a coarse classification, can nevertheless have a strong impact when the number of classes is large. We supplement our theoretical investigation by a simulation study and a real data example where we again observe the above phenomenon.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/01/2021

An Empirical Study on the Joint Impact of Feature Selection and Data Resampling on Imbalance Classification

Real-world datasets often present different degrees of imbalanced (i.e.,...
research
09/05/2023

Graph-Based Automatic Feature Selection for Multi-Class Classification via Mean Simplified Silhouette

This paper introduces a novel graph-based filter method for automatic fe...
research
01/28/2012

Feature selection using nearest attributes

Feature selection is an important problem in high-dimensional data analy...
research
01/23/2020

FsNet: Feature Selection Network on High-dimensional Biological Data

Biological data are generally high-dimensional and require efficient mac...
research
11/16/2020

Theoretical Insights Into Multiclass Classification: A High-dimensional Asymptotic View

Contemporary machine learning applications often involve classification ...
research
09/25/2019

Beyond image classification: zooplankton identification with deep vector space embeddings

Zooplankton images, like many other real world data types, have intrinsi...
research
08/31/2016

A Mathematical Framework for Feature Selection from Real-World Data with Non-Linear Observations

In this paper, we study the challenge of feature selection based on a re...

Please sign up or login with your details

Forgot password? Click here to reset