Mind Your Outliers! Investigating the Negative Impact of Outliers on Active Learning for Visual Question Answering

by   Siddharth Karamcheti, et al.

Active learning promises to alleviate the massive data needs of supervised machine learning: it has successfully improved sample efficiency by an order of magnitude on traditional tasks like topic classification and object recognition. However, we uncover a striking contrast to this promise: across 5 models and 4 datasets on the task of visual question answering, a wide variety of active learning approaches fail to outperform random selection. To understand this discrepancy, we profile 8 active learning methods on a per-example basis, and identify the problem as collective outliers – groups of examples that active learning methods prefer to acquire but models fail to learn (e.g., questions that ask about text in images or require external knowledge). Through systematic ablation experiments and qualitative visualizations, we verify that collective outliers are a general phenomenon responsible for degrading pool-based active learning. Notably, we show that active learning sample efficiency increases significantly as the number of collective outliers in the active learning pool decreases. We conclude with a discussion and prescriptive recommendations for mitigating the effects of these outliers in future work.


page 1

page 6

page 7

page 17


Investigating Multi-source Active Learning for Natural Language Inference

In recent years, active learning has been successfully applied to an arr...

Active Learning for Visual Question Answering: An Empirical Study

We present an empirical study of active learning for Visual Question Ans...

Can I see an Example? Active Learning the Long Tail of Attributes and Relations

There has been significant progress in creating machine learning models ...

Picking groups instead of samples: A close look at Static Pool-based Meta-Active Learning

Active Learning techniques are used to tackle learning problems where ob...

Active Learning Over Multiple Domains in Natural Language Tasks

Studies of active learning traditionally assume the target and source da...

Learning Objective-Specific Active Learning Strategies with Attentive Neural Processes

Pool-based active learning (AL) is a promising technology for increasing...

On the Importance of Adaptive Data Collection for Extremely Imbalanced Pairwise Tasks

Many pairwise classification tasks, such as paraphrase detection and ope...

Please sign up or login with your details

Forgot password? Click here to reset