Multi-dimensional domain generalization with low-rank structures

09/18/2023
by   Sai Li, et al.
0

In conventional statistical and machine learning methods, it is typically assumed that the test data are identically distributed with the training data. However, this assumption does not always hold, especially in applications where the target population are not well-represented in the training data. This is a notable issue in health-related studies, where specific ethnic populations may be underrepresented, posing a significant challenge for researchers aiming to make statistical inferences about these minority groups. In this work, we present a novel approach to addressing this challenge in linear regression models. We organize the model parameters for all the sub-populations into a tensor. By studying a structured tensor completion problem, we can achieve robust domain generalization, i.e., learning about sub-populations with limited or no available data. Our method novelly leverages the structure of group labels and it can produce more reliable and interpretable generalization results. We establish rigorous theoretical guarantees for the proposed method and demonstrate its minimax optimality. To validate the effectiveness of our approach, we conduct extensive numerical experiments and a real data study focused on education level prediction for multiple ethnic groups, comparing our results with those obtained using other existing methods.

READ FULL TEXT
research
01/22/2021

Linear Regression with Distributed Learning: A Generalization Error Perspective

Distributed learning provides an attractive framework for scaling the le...
research
10/06/2021

Robust Multi-dimensional Model Order Estimation Using LineAr Regression of Global Eigenvalues (LaRGE)

The efficient estimation of an approximate model order is very important...
research
06/25/2018

Does data interpolation contradict statistical optimality?

We show that learning methods interpolating the training data can achiev...
research
08/27/2021

Targeting Underrepresented Populations in Precision Medicine: A Federated Transfer Learning Approach

The limited representation of minorities and disadvantaged populations i...
research
09/16/2023

Optimal Estimation under a Semiparametric Density Ratio Model

In many statistical and econometric applications, we gather individual s...
research
02/25/2021

An Online Learning Approach to Interpolation and Extrapolation in Domain Generalization

A popular assumption for out-of-distribution generalization is that the ...
research
10/26/2020

Interpretable Assessment of Fairness During Model Evaluation

For companies developing products or algorithms, it is important to unde...

Please sign up or login with your details

Forgot password? Click here to reset