Perturbations and Subpopulations for Testing Robustness in Token-Based Argument Unit Recognition

09/29/2022
by   Jonathan Kamp, et al.
0

Argument Unit Recognition and Classification aims at identifying argument units from text and classifying them as pro or against. One of the design choices that need to be made when developing systems for this task is what the unit of classification should be: segments of tokens or full sentences. Previous research suggests that fine-tuning language models on the token-level yields more robust results for classifying sentences compared to training on sentences directly. We reproduce the study that originally made this claim and further investigate what exactly token-based systems learned better compared to sentence-based ones. We develop systematic tests for analysing the behavioural differences between the token-based and the sentence-based system. Our results show that token-based models are generally more robust than sentence-based models both on manually perturbed examples and on specific subpopulations of the data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/22/2019

Robust Argument Unit Recognition and Classification

Argument mining is generally performed on the sentence-level -- it is as...
research
10/08/2022

Detecting Label Errors in Token Classification Data

Mislabeled examples are a common issue in real-world data, particularly ...
research
03/08/2021

"Sharks are not the threat humans are": Argument Component Segmentation in School Student Essays

Argument mining is often addressed by a pipeline method where segmentati...
research
10/17/2022

Multi-granularity Argument Mining in Legal Texts

In this paper, we explore legal argument mining using multiple levels of...
research
11/14/2018

Jointly Learning to Label Sentences and Tokens

Learning to construct text representations in end-to-end systems can be ...
research
05/15/2023

Comparing Variation in Tokenizer Outputs Using a Series of Problematic and Challenging Biomedical Sentences

Background Objective: Biomedical text data are increasingly availabl...
research
05/05/2023

Open Information Extraction via Chunks

Open Information Extraction (OIE) aims to extract relational tuples from...

Please sign up or login with your details

Forgot password? Click here to reset