Abstracting Influence Paths for Explaining (Contextualization of) BERT Models

11/02/2020
by   Kaiji Lu, et al.
0

While "attention is all you need" may be proving true, we do not yet know why: attention-based models such as BERT are superior but how they contextualize information even for simple grammatical rules such as subject-verb number agreement (SVA) is uncertain. We introduce multi-partite patterns, abstractions of sets of paths through a neural network model. Patterns quantify and localize the effect of an input concept (e.g., a subject's number) on an output concept (e.g. corresponding verb's number) to paths passing through a sequence of model components, thus surfacing how BERT contextualizes information. We describe guided pattern refinement, an efficient search procedure for finding patterns representative of concept-critical paths. We discover that patterns generate succinct and meaningful explanations for BERT, highlighted by "copy" and "transfer" operations implemented by skip connections and attention heads, respectively. We also show how pattern visualizations help us understand how BERT contextualizes various grammatical concepts, such as SVA across clauses, and why it makes errors in some cases while succeeding in others.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/21/2022

Subject Verb Agreement Error Patterns in Meaningless Sentences: Humans vs. BERT

Both humans and neural language models are able to perform subject-verb ...
research
04/02/2021

Bijections from Dyck and Motzkin meanders with catastrophes to pattern avoiding Dyck paths

In this note, we present constructive bijections from Dyck and Motzkin m...
research
05/03/2020

Influence Paths for Characterizing Subject-Verb Number Agreement in LSTM Language Models

LSTM-based recurrent neural networks are the state-of-the-art for many n...
research
09/05/2022

Distilling the Knowledge of BERT for CTC-based ASR

Connectionist temporal classification (CTC) -based models are attractive...
research
04/10/2020

Telling BERT's full story: from Local Attention to Global Aggregation

We take a deep look into the behavior of self-attention heads in the tra...
research
01/16/2019

Assessing BERT's Syntactic Abilities

I assess the extent to which the recently introduced BERT model captures...
research
03/02/2022

Discontinuous Constituency and BERT: A Case Study of Dutch

In this paper, we set out to quantify the syntactic capacity of BERT in ...

Please sign up or login with your details

Forgot password? Click here to reset