Exploring the Efficacy of Automatically Generated Counterfactuals for Sentiment Analysis

06/29/2021
by   Linyi Yang, et al.
0

While state-of-the-art NLP models have been achieving the excellent performance of a wide range of tasks in recent years, important questions are being raised about their robustness and their underlying sensitivity to systematic biases that may exist in their training and test data. Such issues come to be manifest in performance problems when faced with out-of-distribution data in the field. One recent solution has been to use counterfactually augmented datasets in order to reduce any reliance on spurious patterns that may exist in the original data. Producing high-quality augmented data can be costly and time-consuming as it usually needs to involve human feedback and crowdsourcing efforts. In this work, we propose an alternative by describing and evaluating an approach to automatically generating counterfactual data for data augmentation and explanation. A comprehensive evaluation on several different datasets and using a variety of state-of-the-art benchmarks demonstrate how our approach can achieve significant improvements in model performance when compared to models training on the original data and even when compared to models trained with the benefit of human-generated augmented data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/24/2023

Towards Robust Aspect-based Sentiment Analysis through Non-counterfactual Augmentations

While state-of-the-art NLP models have demonstrated excellent performanc...
research
10/09/2020

Counterfactually-Augmented SNLI Training Data Does Not Yield Better Generalization Than Unaugmented Data

A growing body of work shows that models exploit annotation artifacts to...
research
09/14/2023

CATfOOD: Counterfactual Augmented Training for Improving Out-of-Domain Performance and Calibration

In recent years, large language models (LLMs) have shown remarkable capa...
research
10/30/2022

Counterfactual Data Augmentation via Perspective Transition for Open-Domain Dialogues

The construction of open-domain dialogue systems requires high-quality d...
research
11/14/2022

CST5: Data Augmentation for Code-Switched Semantic Parsing

Extending semantic parsers to code-switched input has been a challenging...
research
09/30/2021

CrossAug: A Contrastive Data Augmentation Method for Debiasing Fact Verification Models

Fact verification datasets are typically constructed using crowdsourcing...
research
10/21/2022

Robustifying Sentiment Classification by Maximally Exploiting Few Counterfactuals

For text classification tasks, finetuned language models perform remarka...

Please sign up or login with your details

Forgot password? Click here to reset