DeepAI AI Chat
Log In Sign Up

Polish Natural Language Inference and Factivity – an Expert-based Dataset and Benchmarks

01/10/2022
by   Daniel Ziembicki, et al.
Politechnika Warszawska
Uniwersytet Warszawski
0

Despite recent breakthroughs in Machine Learning for Natural Language Processing, the Natural Language Inference (NLI) problems still constitute a challenge. To this purpose we contribute a new dataset that focuses exclusively on the factivity phenomenon; however, our task remains the same as other NLI tasks, i.e. prediction of entailment, contradiction or neutral (ECN). The dataset contains entirely natural language utterances in Polish and gathers 2,432 verb-complement pairs and 309 unique verbs. The dataset is based on the National Corpus of Polish (NKJP) and is a representative sample in regards to frequency of main verbs and other linguistic features (e.g. occurrence of internal negation). We found that transformer BERT-based models working on sentences obtained relatively good results (≈89% F1 score). Even though better results were achieved using linguistic features (≈91% F1 score), this model requires more human labour (humans in the loop) because features were prepared manually by expert linguists. BERT-based models consuming only the input sentences show that they capture most of the complexity of NLI/factivity. Complex cases in the phenomenon - e.g. cases with entitlement (E) and non-factive verbs - remain an open issue for further research.

READ FULL TEXT

page 1

page 2

page 3

page 4

01/23/2019

Self-Attentive Model for Headline Generation

Headline generation is a special type of text summarization task. While ...
02/22/2022

Evaluating Persian Tokenizers

Tokenization plays a significant role in the process of lexical analysis...
05/10/2019

Using syntactical and logical forms to evaluate textual inference competence

In the light of recent breakthroughs in transfer learning for Natural La...
01/19/2021

Situation and Behavior Understanding by Trope Detection on Films

The human ability of deep cognitive skills are crucial for the developme...
05/31/2019

Using Natural Language Processing to Develop an Automated Orthodontic Diagnostic System

We work on the task of automatically designing a treatment plan from the...
05/12/2021

Kleister: Key Information Extraction Datasets Involving Long Documents with Complex Layouts

The relevance of the Key Information Extraction (KIE) task is increasing...
05/05/2023

NLI4CT: Multi-Evidence Natural Language Inference for Clinical Trial Reports

How can we interpret and retrieve medical evidence to support clinical d...