Context Variance Evaluation of Pretrained Language Models for Prompt-based Biomedical Knowledge Probing

by   Zonghai Yao, et al.

Pretrained language models (PLMs) have motivated research on what kinds of knowledge these models learn. Fill-in-the-blanks problem (e.g., cloze tests) is a natural approach for gauging such knowledge. BioLAMA generates prompts for biomedical factual knowledge triples and uses the Top-k accuracy metric to evaluate different PLMs' knowledge. However, existing research has shown that such prompt-based knowledge probing methods can only probe a lower bound of knowledge. Many factors like prompt-based probing biases make the LAMA benchmark unreliable and unstable. This problem is more prominent in BioLAMA. The severe long-tailed distribution in vocabulary and large-N-M relation make the performance gap between LAMA and BioLAMA remain notable. To address these, we introduce context variance into the prompt generation and propose a new rank-change-based evaluation metric. Different from the previous known-unknown evaluation criteria, we propose the concept of "Misunderstand" in LAMA for the first time. Through experiments on 12 PLMs, our context variance prompts and Understand-Confuse-Misunderstand (UCM) metric makes BioLAMA more friendly to large-N-M relations and rare relations. We also conducted a set of control experiments to disentangle "understand" from just "read and copy".


page 1

page 2

page 3

page 4


Extracting Biomedical Factual Knowledge Using Pretrained Language Model and Electronic Health Record Context

Language Models (LMs) have performed well on biomedical natural language...

AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts

The remarkable success of pretrained language models has motivated the s...

HAE-RAE Bench: Evaluation of Korean Knowledge in Language Models

Large Language Models (LLMs) pretrained on massive corpora exhibit remar...

Large Language Models, scientific knowledge and factuality: A systematic analysis in antibiotic discovery

Inferring over and extracting information from Large Language Models (LL...

Inspecting the concept knowledge graph encoded by modern language models

The field of natural language understanding has experienced exponential ...

Understanding Knowledge Integration in Language Models with Graph Convolutions

Pretrained language models (LMs) do not capture factual knowledge very w...

SLING: Sino Linguistic Evaluation of Large Language Models

To understand what kinds of linguistic knowledge are encoded by pretrain...

Please sign up or login with your details

Forgot password? Click here to reset