What Changed? Investigating Debiasing Methods using Causal Mediation Analysis

06/01/2022
by   Sullam Jeoung, et al.
0

Previous work has examined how debiasing language models affect downstream tasks, specifically, how debiasing techniques influence task performance and whether debiased models also make impartial predictions in downstream tasks or not. However, what we don't understand well yet is why debiasing methods have varying impacts on downstream tasks and how debiasing techniques affect internal components of language models, i.e., neurons, layers, and attentions. In this paper, we decompose the internal mechanisms of debiasing language models with respect to gender by applying causal mediation analysis to understand the influence of debiasing methods on toxicity detection as a downstream task. Our findings suggest a need to test the effectiveness of debiasing methods with different bias metrics, and to focus on changes in the behavior of certain components of the models, e.g.,first two layers of language models, and attention heads.

READ FULL TEXT

page 7

page 10

page 11

research
01/11/2023

Counteracts: Testing Stereotypical Representation in Pre-trained Language Models

Language models have demonstrated strong performance on various natural ...
research
06/09/2020

Examination and Extension of Strategies for Improving Personalized Language Modeling via Interpolation

In this paper, we detail novel strategies for interpolating personalized...
research
07/28/2023

The Hydra Effect: Emergent Self-repair in Language Model Computations

We investigate the internal structure of language model computations usi...
research
06/09/2023

Measuring and Modifying Factual Knowledge in Large Language Models

Large Language Models (LLMs) store an extensive amount of factual knowle...
research
09/12/2023

Circuit Breaking: Removing Model Behaviors with Targeted Ablation

Language models often exhibit behaviors that improve performance on a pr...
research
04/24/2023

PEFT-Ref: A Modular Reference Architecture and Typology for Parameter-Efficient Finetuning Techniques

Recent parameter-efficient finetuning (PEFT) techniques aim to improve o...
research
05/24/2023

Understanding Arithmetic Reasoning in Language Models using Causal Mediation Analysis

Mathematical reasoning in large language models (LLMs) has garnered atte...

Please sign up or login with your details

Forgot password? Click here to reset