Distributionally Robust Machine Learning with Multi-source Data

09/05/2023
by   Zhenyu Wang, et al.
0

Classical machine learning methods may lead to poor prediction performance when the target distribution differs from the source populations. This paper utilizes data from multiple sources and introduces a group distributionally robust prediction model defined to optimize an adversarial reward about explained variance with respect to a class of target distributions. Compared to classical empirical risk minimization, the proposed robust prediction model improves the prediction accuracy for target populations with distribution shifts. We show that our group distributionally robust prediction model is a weighted average of the source populations' conditional outcome models. We leverage this key identification result to robustify arbitrary machine learning algorithms, including, for example, random forests and neural networks. We devise a novel bias-corrected estimator to estimate the optimal aggregation weight for general machine-learning algorithms and demonstrate its improvement in the convergence rate. Our proposal can be seen as a distributionally robust federated learning approach that is computationally efficient and easy to implement using arbitrary machine learning base algorithms, satisfies some privacy constraints, and has a nice interpretation of different sources' importance for predicting a given target covariate distribution. We demonstrate the performance of our proposed group distributionally robust method on simulated and real data with random forests and neural networks as base-learning algorithms.

READ FULL TEXT
research
09/12/2023

Distributionally Robust Transfer Learning

Many existing transfer learning methods rely on leveraging information f...
research
12/28/2017

Kernel Robust Bias-Aware Prediction under Covariate Shift

Under covariate shift, training (source) data and testing (target) data ...
research
12/07/2021

Stabilized Direct Learning for Efficient Estimation of Individualized Treatment Rules

In recent years, the field of precision medicine has seen many advanceme...
research
09/12/2022

Semi-supervised Triply Robust Inductive Transfer Learning

In this work, we propose a semi-supervised triply robust inductive trans...
research
08/10/2022

Doubly Robust Augmented Model Accuracy Transfer Inference with High Dimensional Features

Due to label scarcity and covariate shift happening frequently in real-w...
research
08/14/2019

Towards Linearization Machine Learning Algorithms

This paper is about a machine learning approach based on the multilinear...
research
02/25/2020

A General Method for Robust Learning from Batches

In many applications, data is collected in batches, some of which are co...

Please sign up or login with your details

Forgot password? Click here to reset