An Investigation of Noise in Morphological Inflection

05/26/2023
by   Adam Wiemerslage, et al.
0

With a growing focus on morphological inflection systems for languages where high-quality data is scarce, training data noise is a serious but so far largely ignored concern. We aim at closing this gap by investigating the types of noise encountered within a pipeline for truly unsupervised morphological paradigm completion and its impact on morphological inflection systems: First, we propose an error taxonomy and annotation pipeline for inflection training data. Then, we compare the effect of different types of noise on multiple state-of-the-art inflection models. Finally, we propose a novel character-level masked language modeling (CMLM) pretraining objective and explore its impact on the models' resistance to noise. Our experiments show that various architectures are impacted differently by separate types of noise, but encoder-decoders tend to be more robust to noise than models trained with a copy bias. CMLM pretraining helps transformers, but has lower impact on LSTMs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/28/2020

The SIGMORPHON 2020 Shared Task on Unsupervised Morphological Paradigm Completion

In this paper, we describe the findings of the SIGMORPHON 2020 shared ta...
research
06/03/2019

Better Character Language Modeling Through Morphology

We incorporate morphological supervision into character language models ...
research
05/17/2017

Unlabeled Data for Morphological Generation With Character-Based Sequence-to-Sequence Models

We present a semi-supervised way of training a character-based encoder-d...
research
05/02/2023

On the Impact of Data Quality on Image Classification Fairness

With the proliferation of algorithmic decision-making, increased scrutin...
research
04/16/2014

Automatic Annotation of Axoplasmic Reticula in Pursuit of Connectomes

In this paper, we present a new pipeline which automatically identifies ...
research
05/08/2021

Falling Through the Gaps: Neural Architectures as Models of Morphological Rule Learning

Recent advances in neural architectures have revived the problem of morp...
research
05/09/2023

What is the best recipe for character-level encoder-only modelling?

This paper aims to benchmark recent progress in language understanding m...

Please sign up or login with your details

Forgot password? Click here to reset