Factual Probing Is [MASK]: Learning vs. Learning to Recall

by   Zexuan Zhong, et al.

Petroni et al. (2019) demonstrated that it is possible to retrieve world facts from a pre-trained language model by expressing them as cloze-style prompts and interpret the model's prediction accuracy as a lower bound on the amount of factual information it encodes. Subsequent work has attempted to tighten the estimate by searching for better prompts, using a disjoint set of facts as training data. In this work, we make two complementary contributions to better understand these factual probing techniques. First, we propose OptiPrompt, a novel and efficient method which directly optimizes in continuous embedding space. We find this simple method is able to predict an additional 6.4 question: Can we really interpret these probing results as a lower bound? Is it possible that these prompt-search methods learn from the training data too? We find, somewhat surprisingly, that the training data used by these methods contains certain regularities of the underlying fact distribution, and all the existing prompt methods, including ours, are able to exploit them for better fact prediction. We conduct a set of control experiments to disentangle "learning" from "learning to recall", providing a more detailed picture of what different prompts can reveal about pre-trained language models.


page 1

page 2

page 3

page 4


Pre-trained Language Models as Symbolic Reasoners over Knowledge?

How can pre-trained language models (PLMs) learn factual knowledge from ...

Pre-trained language models as knowledge bases for Automotive Complaint Analysis

Recently it has been shown that large pre-trained language models like B...

A Data Cartography based MixUp for Pre-trained Language Models

MixUp is a data augmentation strategy where additional samples are gener...

Measuring and Manipulating Knowledge Representations in Language Models

Neural language models (LMs) represent facts about the world described b...

BERT-kNN: Adding a kNN Search Component to Pretrained Language Models for Better QA

Khandelwal et al. (2020) show that a k-nearest-neighbor (kNN) component ...

Sherlock: Scalable Fact Learning in Images

We study scalable and uniform understanding of facts in images. Existing...

Can Deep Neural Networks Predict Data Correlations from Column Names?

For humans, it is often possible to predict data correlations from colum...

Please sign up or login with your details

Forgot password? Click here to reset