Pseudo-Labels Are All You Need

08/19/2022
by   Bogdan Kostić, et al.
0

Automatically estimating the complexity of texts for readers has a variety of applications, such as recommending texts with an appropriate complexity level to language learners or supporting the evaluation of text simplification approaches. In this paper, we present our submission to the Text Complexity DE Challenge 2022, a regression task where the goal is to predict the complexity of a German sentence for German learners at level B. Our approach relies on more than 220,000 pseudo-labels created from the German Wikipedia and other corpora to train Transformer-based models, and refrains from any feature engineering or any additional, labeled data. We find that the pseudo-label-based approach gives impressive results yet requires little to no adjustment to the specific task and therefore could be easily adapted to other domains and tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/16/2019

Subjective Assessment of Text Complexity: A Dataset for German Language

This paper presents TextComplexityDE, a dataset consisting of 1000 sente...
research
06/18/2019

Text Readability Assessment for Second Language Learners

This paper addresses the task of readability assessment for the texts ai...
research
04/21/2022

German Parliamentary Corpus (GerParCor)

Parliamentary debates represent a large and partly unexploited treasure ...
research
08/22/2023

Using ChatGPT as a CAT tool in Easy Language translation

This study sets out to investigate the feasibility of using ChatGPT to t...
research
05/19/2023

Unsupervised ASR via Cross-Lingual Pseudo-Labeling

Recent work has shown that it is possible to train an unsupervised autom...
research
04/19/2022

I still have Time(s): Extending HeidelTime for German Texts

HeidelTime is one of the most widespread and successful tools for detect...
research
02/18/2021

UnibucKernel: Geolocating Swiss German Jodels Using Ensemble Learning

In this work, we describe our approach addressing the Social Media Varie...

Please sign up or login with your details

Forgot password? Click here to reset