Evaluating Language Tools for Fifteen EU-official Under-resourced Languages

10/23/2020
by   Diego Alves, et al.
0

This article presents the results of the evaluation campaign of language tools available for fifteen EU-official under-resourced languages. The evaluation was conducted within the MSC ITN CLEOPATRA action that aims at building the cross-lingual event-centric knowledge processing on top of the application of linguistic processing chains (LPCs) for at least 24 EU-official languages. In this campaign, we concentrated on three existing NLP platforms (Stanford CoreNLP, NLP Cube, UDPipe) that all provide models for under-resourced languages and in this first run we covered 15 under-resourced languages for which the models were available. We present the design of the evaluation campaign and present the results as well as discuss them. We considered the difference between reported and our tested results within a single percentage point as being within the limits of acceptable tolerance and thus consider this result as reproducible. However, for a number of languages, the results are below what was reported in the literature, and in some cases, our testing results are even better than the ones reported previously. Particularly problematic was the evaluation of NERC systems. One of the reasons is the absence of universally or cross-lingually applicable named entities classification scheme that would serve the NERC task in different languages analogous to the Universal Dependency scheme in parsing task. To build such a scheme has become one of our the future research directions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/23/2020

Natural Language Processing Chains Inside a Cross-lingual Event-Centric Knowledge Pipeline for European Union Under-resourced Languages

This article presents the strategy for developing a platform containing ...
research
03/18/2022

Challenges and Strategies in Cross-Cultural NLP

Various efforts in the Natural Language Processing (NLP) community have ...
research
08/05/2021

EENLP: Cross-lingual Eastern European NLP Index

This report presents the results of the EENLP project, done as a part of...
research
01/31/2023

Zero-shot cross-lingual transfer language selection using linguistic similarity

We study the selection of transfer languages for different Natural Langu...
research
09/18/2020

FarsTail: A Persian Natural Language Inference Dataset

Natural language inference (NLI) is known as one of the central tasks in...
research
10/15/2021

Cross-Lingual Fine-Grained Entity Typing

The growth of cross-lingual pre-trained models has enabled NLP tools to ...
research
01/28/2019

A General Overview of Formal Languages for Individual-Based Modelling of Ecosystems

Various formal languages have been proposed in the literature for the in...

Please sign up or login with your details

Forgot password? Click here to reset