Learning to Learn to be Right for the Right Reasons

by   Pride Kavumba, et al.

Improving model generalization on held-out data is one of the core objectives in commonsense reasoning. Recent work has shown that models trained on the dataset with superficial cues tend to perform well on the easy test set with superficial cues but perform poorly on the hard test set without superficial cues. Previous approaches have resorted to manual methods of encouraging models not to overfit to superficial cues. While some of the methods have improved performance on hard instances, they also lead to degraded performance on easy instances. Here, we propose to explicitly learn a model that does well on both the easy test set with superficial cues and hard test set without superficial cues. Using a meta-learning objective, we learn such a model that improves performance on both the easy test set and the hard test set. By evaluating our models on Choice of Plausible Alternatives (COPA) and Commonsense Explanation, we show that our proposed method leads to improved performance on both the easy test set and the hard test set upon which we observe up to 16.5 percentage points improvement over the baseline.


page 1

page 2

page 3

page 4


SuperSim: a test set for word similarity and relatedness in Swedish

Language models are notoriously difficult to evaluate. We release SuperS...

When Choosing Plausible Alternatives, Clever Hans can be Clever

Pretrained language models, such as BERT and RoBERTa, have shown large i...

Towards Debiasing Fact Verification Models

Fact verification requires validating a claim in the context of evidence...

Which Shortcut Cues Will DNNs Choose? A Study from the Parameter-Space Perspective

Deep neural networks (DNNs) often rely on easy-to-learn discriminatory f...

Visual cues in estimation of part-to-whole comparison

Pie charts were first published in 1801 by William Playfair and have cau...

A New Look at the Easy-Hard-Easy Pattern of Combinatorial Search Difficulty

The easy-hard-easy pattern in the difficulty of combinatorial search pro...

Symbolic Brittleness in Sequence Models: on Systematic Generalization in Symbolic Mathematics

Neural sequence models trained with maximum likelihood estimation have l...

Please sign up or login with your details

Forgot password? Click here to reset