Flexible text generation for counterfactual fairness probing

06/28/2022
by   Zee Fryer, et al.
3

A common approach for testing fairness issues in text-based classifiers is through the use of counterfactuals: does the classifier output change if a sensitive attribute in the input is changed? Existing counterfactual generation methods typically rely on wordlists or templates, producing simple counterfactuals that don't take into account grammar, context, or subtle sensitive attribute references, and could miss issues that the wordlist creators had not considered. In this paper, we introduce a task for generating counterfactuals that overcomes these shortcomings, and demonstrate how large language models (LLMs) can be leveraged to make progress on this task. We show that this LLM-based method can produce complex counterfactuals that existing methods cannot, comparing the performance of various counterfactual generation methods on the Civil Comments dataset and showing their value in evaluating a toxicity classifier.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/27/2018

Counterfactual Fairness in Text Classification through Robustness

In this paper, we study counterfactual fairness in text classification, ...
research
02/08/2022

Counterfactual Multi-Token Fairness in Text Classification

The counterfactual token generation has been limited to perturbing only ...
research
06/21/2022

Plug and Play Counterfactual Text Generation for Model Robustness

Generating counterfactual test-cases is an important backbone for testin...
research
11/14/2020

Shortcomings of Counterfactual Fairness and a Proposed Modification

In this paper, I argue that counterfactual fairness does not constitute ...
research
08/03/2021

Improving Counterfactual Generation for Fair Hate Speech Detection

Bias mitigation approaches reduce models' dependence on sensitive featur...
research
01/01/2021

Polyjuice: Automated, General-purpose Counterfactual Generation

Counterfactual examples have been shown to be useful for many applicatio...
research
05/17/2023

Counterfactually Comparing Abstaining Classifiers

Abstaining classifiers have the option to abstain from making prediction...

Please sign up or login with your details

Forgot password? Click here to reset