The All Relevant Feature Selection using Random Forest

06/25/2011
by   Miron B. Kursa, et al.
0

In this paper we examine the application of the random forest classifier for the all relevant feature selection problem. To this end we first examine two recently proposed all relevant feature selection algorithms, both being a random forest wrappers, on a series of synthetic data sets with varying size. We show that reasonable accuracy of predictions can be achieved and that heuristic algorithms that were designed to handle the all relevant problem, have performance that is close to that of the reference ideal algorithm. Then, we apply one of the algorithms to four families of semi-synthetic data sets to assess how the properties of particular data set influence results of feature selection. Finally we test the procedure using a well-known gene expression data set. The relevance of nearly all previously established important genes was confirmed, moreover the relevance of several new ones is discovered.

READ FULL TEXT
research
06/16/2011

Random forest models of the retention constants in the thin layer chromatography

In the current study we examine an application of the machine learning m...
research
04/01/2020

Sequential Feature Classification in the Context of Redundancies

The problem of all-relevant feature selection is concerned with finding ...
research
02/03/2014

Applying Supervised Learning Algorithms and a New Feature Selection Method to Predict Coronary Artery Disease

From a fresh data science perspective, this thesis discusses the predict...
research
01/12/2011

Review and Evaluation of Feature Selection Algorithms in Synthetic Problems

The main purpose of Feature Subset Selection is to find a reduced subset...
research
09/04/2023

OutRank: Speeding up AutoML-based Model Search for Large Sparse Data sets with Cardinality-aware Feature Ranking

The design of modern recommender systems relies on understanding which p...
research
04/05/2023

Opening the random forest black box by the analysis of the mutual impact of features

Random forest is a popular machine learning approach for the analysis of...
research
10/26/2020

Data Mining Ice Cubes

IceCube is a 1 km3 scale neutrino telescope located at the geographic So...

Please sign up or login with your details

Forgot password? Click here to reset