An Empirical Evaluation of the t-SNE Algorithm for Data Visualization in Structural Engineering

09/18/2021
by   Parisa Hajibabaee, et al.
0

A fundamental task in machine learning involves visualizing high-dimensional data sets that arise in high-impact application domains. When considering the context of large imbalanced data, this problem becomes much more challenging. In this paper, the t-Distributed Stochastic Neighbor Embedding (t-SNE) algorithm is used to reduce the dimensions of an earthquake engineering related data set for visualization purposes. Since imbalanced data sets greatly affect the accuracy of classifiers, we employ Synthetic Minority Oversampling Technique (SMOTE) to tackle the imbalanced nature of such data set. We present the result obtained from t-SNE and SMOTE and compare it to the basic approaches with various aspects. Considering four options and six classification algorithms, we show that using t-SNE on the imbalanced data and SMOTE on the training data set, neural network classifiers have promising results without sacrificing accuracy. Hence, we can transform the studied scientific data into a two-dimensional (2D) space, enabling the visualization of the classifier and the resulting decision surface using a 2D plot.

READ FULL TEXT

page 1

page 7

research
05/31/2018

Superensemble classifier for learning from imbalanced business school data set

Private business schools in India face a common problem of selecting qua...
research
09/12/2023

A Perceptron-based Fine Approximation Technique for Linear Separation

This paper presents a novel online learning method that aims at finding ...
research
08/16/2019

Visualization of Very Large High-Dimensional Data Sets as Minimum Spanning Trees

Here, we introduce a new data visualization and exploration method, TMAP...
research
11/29/2010

Classifying extremely imbalanced data sets

Imbalanced data sets containing much more background than signal instanc...
research
03/02/2023

Encoding of data sets and algorithms

In many high-impact applications, it is important to ensure the quality ...
research
04/03/2020

Stacked Generalizations in Imbalanced Fraud Data Sets using Resampling Methods

This study uses stacked generalization, which is a two-step process of c...
research
03/06/2020

CNN-based Repetitive self-revised learning for photos' aesthetics imbalanced classification

Aesthetic assessment is subjective, and the distribution of the aestheti...

Please sign up or login with your details

Forgot password? Click here to reset