UniMorph 4.0: Universal Morphology

05/07/2022
∙
by   Khuyagbaatar Batsuren, et al.
∙
2
∙

The Universal Morphology (UniMorph) project is a collaborative effort providing broad-coverage instantiated normalized morphological inflection tables for hundreds of diverse world languages. The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema. This paper presents the expansions and improvements made on several fronts over the last couple of years (since McCarthy et al. (2020)). Collaborative efforts by numerous linguists have added 67 new languages, including 30 endangered languages. We have implemented several improvements to the extraction pipeline to tackle some issues, e.g. missing gender and macron information. We have also amended the schema to use a hierarchical structure that is needed for morphological phenomena like multiple-argument agreement and case stacking, while adding some missing morphological features to make the schema more inclusive. In light of the last UniMorph release, we also augmented the database with morpheme segmentation for 16 languages. Lastly, this new release makes a push towards inclusion of derivational morphology in UniMorph by enriching the data and annotation schema with instances representing derivational processes from MorphyNet.

READ FULL TEXT
research
∙ 10/25/2018

UniMorph 2.0: Universal Morphology

The Universal Morphology UniMorph project is a collaborative effort to i...
research
∙ 05/10/2023

K-UniMorph: Korean Universal Morphology and its Feature Schema

We present in this work a new Universal Morphology dataset for Korean. P...
research
∙ 03/16/2022

Morphological Reinflection with Multiple Arguments: An Extended Annotation schema and a Georgian Case Study

In recent years, a flurry of morphological datasets had emerged, most no...
research
∙ 10/15/2018

Marrying Universal Dependencies and Universal Morphology

The Universal Dependencies (UD) and Universal Morphology (UniMorph) proj...
research
∙ 12/11/2020

Morphology Matters: A Multilingual Language Modeling Analysis

Prior studies in multilingual language modeling (e.g., Cotterell et al.,...
research
∙ 08/15/2019

What's Wrong with Hebrew NLP? And How to Make it Right

For languages with simple morphology, such as English, automatic annotat...
research
∙ 07/08/2015

What Your Username Says About You

Usernames are ubiquitous on the Internet, and they are often suggestive ...

Please sign up or login with your details

Forgot password? Click here to reset