Automatic Readability Assessment of German Sentences with Transformer Ensembles

09/09/2022
by   Patrick Gustav Blaneck, et al.
0

Reliable methods for automatic readability assessment have the potential to impact a variety of fields, ranging from machine translation to self-informed learning. Recently, large language models for the German language (such as GBERT and GPT-2-Wechsel) have become available, allowing to develop Deep Learning based approaches that promise to further improve automatic readability assessment. In this contribution, we studied the ability of ensembles of fine-tuned GBERT and GPT-2-Wechsel models to reliably predict the readability of German sentences. We combined these models with linguistic features and investigated the dependence of prediction performance on ensemble size and composition. Mixed ensembles of GBERT and GPT-2-Wechsel performed better than ensembles of the same size consisting of only GBERT or GPT-2-Wechsel models. Our models were evaluated in the GermEval 2022 Shared Task on Text Complexity Assessment on data of German sentences. On out-of-sample data, our best ensemble achieved a root mean squared error of 0.435.

READ FULL TEXT
research
09/19/2019

A Corpus for Automatic Readability Assessment and Text Simplification of German

In this paper, we present a corpus for use in automatic readability asse...
research
04/16/2019

Subjective Assessment of Text Complexity: A Dataset for German Language

This paper presents TextComplexityDE, a dataset consisting of 1000 sente...
research
09/07/2021

FHAC at GermEval 2021: Identifying German toxic, engaging, and fact-claiming comments with ensemble learning

The availability of language representations learned by large pretrained...
research
07/13/2022

A Transfer Learning Based Model for Text Readability Assessment in German

Text readability assessment has a wide range of applications for differe...
research
06/08/2020

Wat zei je? Detecting Out-of-Distribution Translations with Variational Transformers

We detect out-of-training-distribution sentences in Neural Machine Trans...
research
02/23/2022

Short-answer scoring with ensembles of pretrained language models

We investigate the effectiveness of ensembles of pretrained transformer-...
research
10/07/2021

Sparse MoEs meet Efficient Ensembles

Machine learning models based on the aggregated outputs of submodels, ei...

Please sign up or login with your details

Forgot password? Click here to reset