Investigating BERT's Knowledge of Language: Five Analysis Methods with NPIs

09/05/2019
by   Alex Warstadt, et al.
0

Though state-of-the-art sentence representation models can perform tasks requiring significant knowledge of grammar, it is an open question how best to evaluate their grammatical knowledge. We explore five experimental methods inspired by prior work evaluating pretrained sentence representation models. We use a single linguistic phenomenon, negative polarity item (NPI) licensing in English, as a case study for our experiments. NPIs like "any" are grammatical only if they appear in a licensing environment like negation ("Sue doesn't have any cats" vs. "Sue has any cats"). This phenomenon is challenging because of the variety of NPI licensing environments that exist. We introduce an artificially generated dataset that manipulates key features of NPI licensing for the experiments. We find that BERT has significant knowledge of these features, but its success varies widely across different experimental methods. We conclude that a variety of methods is necessary to reveal all relevant aspects of a model's grammatical knowledge in a given domain.

READ FULL TEXT

page 5

page 7

page 13

page 14

research
02/16/2020

SBERT-WK: A Sentence Embedding Method by Dissecting BERT-based Word Models

Sentence embedding is an important research topic in natural language pr...
research
11/02/2020

A Closer Look at Linguistic Knowledge in Masked Language Models: The Case of Relative Clauses in American English

Transformer-based language models achieve high performance on various ta...
research
05/24/2023

This Land is Your, My Land: Evaluating Geopolitical Biases in Language Models

We introduce the notion of geopolitical bias – a tendency to report diff...
research
04/30/2020

Investigating Transferability in Pretrained Language Models

While probing is a common technique for identifying knowledge in the rep...
research
06/04/2019

Open Sesame: Getting Inside BERT's Linguistic Knowledge

How and to what extent does BERT encode syntactically-sensitive hierarch...
research
09/14/2021

Frequency Effects on Syntactic Rule Learning in Transformers

Pre-trained language models perform well on a variety of linguistic task...
research
05/24/2020

Common Sense or World Knowledge? Investigating Adapter-Based Knowledge Injection into Pretrained Transformers

Following the major success of neural language models (LMs) such as BERT...

Please sign up or login with your details

Forgot password? Click here to reset