Reducing Data Complexity using Autoencoders with Class-informed Loss Functions

11/11/2021
by   David Charte, et al.
0

Available data in machine learning applications is becoming increasingly complex, due to higher dimensionality and difficult classes. There exists a wide variety of approaches to measuring complexity of labeled data, according to class overlap, separability or boundary shapes, as well as group morphology. Many techniques can transform the data in order to find better features, but few focus on specifically reducing data complexity. Most data transformation methods mainly treat the dimensionality aspect, leaving aside the available information within class labels which can be useful when classes are somehow complex. This paper proposes an autoencoder-based approach to complexity reduction, using class labels in order to inform the loss function about the adequacy of the generated variables. This leads to three different new feature learners, Scorer, Skaler and Slicer. They are based on Fisher's discriminant ratio, the Kullback-Leibler divergence and least-squares support vector machines, respectively. They can be applied as a preprocessing stage for a binary classification problem. A thorough experimentation across a collection of 27 datasets and a range of complexity and classification metrics shows that class-informed autoencoders perform better than 4 other popular unsupervised feature extraction techniques, especially when the final objective is using the data for a classification task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/30/2018

Risk-Averse Classification

We develop a new approach to solving classification problems, in which t...
research
06/03/2022

Impact of the composition of feature extraction and class sampling in medicare fraud detection

With healthcare being critical aspect, health insurance has become an im...
research
02/07/2019

License Plate Recognition with Compressive Sensing Based Feature Extraction

License plate recognition is the key component to many automatic traffic...
research
03/20/2018

UnibucKernel: A kernel-based learning method for complex word identification

In this paper, we present a kernel-based learning approach for the 2018 ...
research
12/01/2017

Hierarchical Bayesian image analysis: from low-level modeling to robust supervised learning

Within a supervised classification framework, labeled data are used to l...

Please sign up or login with your details

Forgot password? Click here to reset