Predictive Heterogeneity: Measures and Applications

by   Jiashuo Liu, et al.
Tsinghua University

As an intrinsic and fundamental property of big data, data heterogeneity exists in a variety of real-world applications, such as precision medicine, autonomous driving, financial applications, etc. For machine learning algorithms, the ignorance of data heterogeneity will greatly hurt the generalization performance and the algorithmic fairness, since the prediction mechanisms among different sub-populations are likely to differ from each other. In this work, we focus on the data heterogeneity that affects the prediction of machine learning models, and firstly propose the usable predictive heterogeneity, which takes into account the model capacity and computational constraints. We prove that it can be reliably estimated from finite data with probably approximately correct (PAC) bounds. Additionally, we design a bi-level optimization algorithm to explore the usable predictive heterogeneity from data. Empirically, the explored heterogeneity provides insights for sub-population divisions in income prediction, crop yield prediction and image classification tasks, and leveraging such heterogeneity benefits the out-of-distribution generalization performance.


page 13

page 14

page 20


Exploring and Exploiting Data Heterogeneity in Recommendation

Massive amounts of data are the foundation of data-driven recommendation...

Federated Variational Inference: Towards Improved Personalization and Generalization

Conventional federated learning algorithms train a single global model b...

gLOP: the global and Local Penalty for Capturing Predictive Heterogeneity

When faced with a supervised learning problem, we hope to have rich enou...

Using Latent Class Analysis to Identify ARDS Sub-phenotypes for Enhanced Machine Learning Predictive Performance

In this work, we utilize Machine Learning for early recognition of patie...

ICON^2: Reliably Benchmarking Predictive Inequity in Object Detection

As computer vision systems are being increasingly deployed at scale in h...

Please sign up or login with your details

Forgot password? Click here to reset