ScoNe: Benchmarking Negation Reasoning in Language Models With Fine-Tuning and In-Context Learning

05/30/2023
by   Jingyuan Selena She, et al.
0

A number of recent benchmarks seek to assess how well models handle natural language negation. However, these benchmarks lack the controlled example paradigms that would allow us to infer whether a model had learned how negation morphemes semantically scope. To fill these analytical gaps, we present the Scoped Negation NLI (ScoNe-NLI) benchmark, which contains contrast sets of six examples with up to two negations where either zero, one, or both negative morphemes affect the NLI label. We use ScoNe-NLI to assess fine-tuning and in-context learning strategies. We find that RoBERTa and DeBERTa models solve ScoNe-NLI after many shot fine-tuning. For in-context learning, we test InstructGPT models and find that most prompt strategies are not successful, including those using step-by-step reasoning. To better understand this result, we extend ScoNe with ScoNe-NLG, a sentence completion test set that embeds negation reasoning in short narratives. Here, InstructGPT is successful, which reveals the model can correctly reason about negation, but struggles to do so on prompt-adapted NLI examples outside of its core pretraining regime.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/02/2023

RadAdapt: Radiology Report Summarization via Lightweight Domain Adaptation of Large Language Models

We systematically investigate lightweight strategies to adapt large lang...
research
09/18/2023

Understanding Catastrophic Forgetting in Language Models via Implicit Inference

Fine-tuning (via methods such as instruction-tuning or reinforcement lea...
research
12/15/2021

Decomposing Natural Logic Inferences in Neural NLI

In the interest of interpreting neural NLI models and their reasoning st...
research
09/09/2023

FIAT: Fusing learning paradigms with Instruction-Accelerated Tuning

Learning paradigms for large language models (LLMs) currently tend to fa...
research
02/18/2023

Bag of Tricks for Effective Language Model Pretraining and Downstream Adaptation: A Case Study on GLUE

This technical report briefly describes our JDExplore d-team's submissio...
research
04/22/2019

Exploring Unsupervised Pretraining and Sentence Structure Modelling for Winograd Schema Challenge

Winograd Schema Challenge (WSC) was proposed as an AI-hard problem in te...
research
09/05/2023

Making Large Language Models Better Reasoners with Alignment

Reasoning is a cognitive process of using evidence to reach a sound conc...

Please sign up or login with your details

Forgot password? Click here to reset