Measuring the effects of confounders in medical supervised classification problems: the Confounding Index (CI)

by   Elisa Ferrari, et al.

Over the years, there has been growing interest in using Machine Learning techniques for biomedical data processing. When tackling these tasks, one needs to bear in mind that biomedical data depends on a variety of characteristics, such as demographic aspects (age, gender, etc) or the acquisition technology, which might be unrelated with the target of the analysis. In supervised tasks, failing to match the ground truth targets with respect to such characteristics, called confounders, may lead to very misleading estimates of the predictive performance. Many strategies have been proposed to handle confounders, ranging from data selection, to normalization techniques, up to the use of training algorithm for learning with imbalanced data. However, all these solutions require the confounders to be known a priori. To this aim, we introduce a novel index that is able to measure the confounding effect of a data attribute in a bias-agnostic way. This index can be used to quantitatively compare the confounding effects of different variables and to inform correction methods such as normalization procedures or ad-hoc-prepared learning algorithms. The effectiveness of this index is validated on both simulated data and real-world neuroimaging data.


page 1

page 2

page 3

page 4


It's easy to fool yourself: Case studies on identifying bias and confounding in bio-medical datasets

Confounding variables are a well known source of nuisance in biomedical ...

Bridging the Generalization Gap: Training Robust Models on Confounded Biological Data

Statistical learning on biological data can be challenging due to confou...

Using permutations to assess confounding in machine learning applications for digital health

Clinical machine learning applications are often plagued with confounder...

Regression to the Mean's Impact on the Synthetic Control Method: Bias and Sensitivity Analysis

To make informed policy recommendations from observational data, we must...

Pulling Up by the Causal Bootstraps: Causal Data Augmentation for Pre-training Debiasing

Machine learning models achieve state-of-the-art performance on many sup...

Fair Deep Learning Prediction for Healthcare Applications with Confounder Filtering

The rapid development of deep learning methods has permitted the fast an...

Index Coded - NOMA in Vehicular Ad Hoc Networks

The demand for multimedia services is growing day by day in vehicular ad...

Please sign up or login with your details

Forgot password? Click here to reset