Distance Correlation Sure Independence Screening for Accelerated Feature Selection in Parkinson's Disease Vocal Data

06/23/2020
by   Dan Schellhas, et al.
0

With the abundance of machine learning methods available and the temptation of using them all in an ensemble method, having a model-agnostic method of feature selection is incredibly alluring. Principal component analysis was developed in 1901 and has been a strong contender in this role since, but in the end is an unsupervised method. It offers no guarantee that the features that are selected have good predictive power because it does not know what is being predicted. To this end, Peng et al. developed the minimum redundancy-maximum relevance (mRMR) method in 2005. It uses the mutual information not only between predictors but also includes the mutual information with the response in its calculation. Estimating mutual information and entropy tend to be expensive and problematic endeavors, which leads to excessive processing times even for dataset that is approximately 750 by 750 in a Leave-One-Subject-Out jackknife situation. To remedy this, we use a method from 2012 called Distance Correlation Sure Independence Screening (DC-SIS) which uses the distance correlation measure of Székely et al. to select features that have the greatest dependence with the response. We show that this method produces statistically indistinguishable results to the mRMR selection method on Parkinson's Disease vocal diagnosis data 90 times faster.

READ FULL TEXT

page 1

page 6

research
10/21/2022

An Adaptive Neighborhood Partition Full Conditional Mutual Information Maximization Method for Feature Selection

Feature selection is used to eliminate redundant features and keep relev...
research
11/24/2014

Mutual Information-Based Unsupervised Feature Transformation for Heterogeneous Feature Subset Selection

Conventional mutual information (MI) based feature selection (FS) method...
research
09/04/2018

Pointwise HSIC: A Linear-Time Kernelized Co-occurrence Norm for Sparse Linguistic Expressions

In this paper, we propose a new kernel-based co-occurrence measure that ...
research
08/09/2021

A Bayesian Nonparametric Estimation of Mutual Information

Mutual information is a widely-used information theoretic measure to qua...
research
01/27/2020

Feature selection in machine learning: Rényi min-entropy vs Shannon entropy

Feature selection, in the context of machine learning, is the process of...
research
06/07/2023

Hardness of Deceptive Certificate Selection

Recent progress towards theoretical interpretability guarantees for AI h...
research
01/25/2020

Reducing Noise from Competing Neighbours: Word Retrieval with Lateral Inhibition in Multilink

Multilink is a computational model for word retrieval in monolingual and...

Please sign up or login with your details

Forgot password? Click here to reset