WILDS: A Benchmark of in-the-Wild Distribution Shifts

12/14/2020
by   Pang Wei Koh, et al.
11

Distribution shifts can cause significant degradation in a broad range of machine learning (ML) systems deployed in the wild. However, many widely-used datasets in the ML community today were not designed for evaluating distribution shifts. These datasets typically have training and test sets drawn from the same distribution, and prior work on retrofitting them with distribution shifts has generally relied on artificial shifts that need not represent the kinds of shifts encountered in the wild. In this paper, we present WILDS, a benchmark of in-the-wild distribution shifts spanning diverse data modalities and applications, from tumor identification to wildlife monitoring to poverty mapping. WILDS builds on top of recent data collection efforts by domain experts in these applications and provides a unified collection of datasets with evaluation metrics and train/test splits that are representative of real-world distribution shifts. These datasets reflect distribution shifts arising from training and testing on different hospitals, cameras, countries, time periods, demographics, molecular scaffolds, etc., all of which cause substantial performance drops in our baseline models. Finally, we survey other applications that would be promising additions to the benchmark but for which we did not manage to find appropriate datasets; we discuss their associated challenges and detail datasets and shifts where we did not see an appreciable performance drop. By unifying datasets from a variety of application areas and making them accessible to the ML community, we hope to encourage the development of general-purpose methods that are anchored to real-world distribution shifts and that work well across different applications and problem settings. Data loaders, default models, and leaderboards are available at https://wilds.stanford.edu.

READ FULL TEXT

page 13

page 17

page 21

page 24

research
11/25/2022

Wild-Time: A Benchmark of in-the-Wild Distribution Shift over Time

Distribution shift occurs when the test distribution differs from the tr...
research
03/05/2023

Robustness, Evaluation and Adaptation of Machine Learning Models in the Wild

Our goal is to improve reliability of Machine Learning (ML) systems depl...
research
12/09/2021

Extending the WILDS Benchmark for Unsupervised Adaptation

Machine learning systems deployed in the wild are often trained on a sou...
research
12/31/2021

Improving Baselines in the Wild

We share our experience with the recently released WILDS benchmark, a co...
research
07/29/2021

Did the Model Change? Efficiently Assessing Machine Learning API Shifts

Machine learning (ML) prediction APIs are increasingly widely used. An M...
research
04/07/2023

Supervised Contrastive Learning with Heterogeneous Similarity for Distribution Shifts

Distribution shifts are problems where the distribution of data changes ...
research
09/26/2022

A Comprehensive Review of Trends, Applications and Challenges In Out-of-Distribution Detection

With recent advancements in artificial intelligence, its applications ca...

Please sign up or login with your details

Forgot password? Click here to reset