CLASSLA-Stanza: The Next Step for Linguistic Processing of South Slavic Languages

08/08/2023
by   Luka Terčon, et al.
0

We present CLASSLA-Stanza, a pipeline for automatic linguistic annotation of the South Slavic languages, which is based on the Stanza natural language processing pipeline. We describe the main improvements in CLASSLA-Stanza with respect to Stanza, and give a detailed description of the model training process for the latest 2.1 release of the pipeline. We also report performance scores produced by the pipeline for different languages and varieties. CLASSLA-Stanza exhibits consistently high performance across all the supported languages and outperforms or expands its parent pipeline Stanza at all the supported tasks. We also present the pipeline's new functionality enabling efficient processing of web data and the reasons that led to its implementation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/01/2019

CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data

Pre-training text representations have led to significant improvements i...
research
12/13/2018

Towards a General-Purpose Linguistic Annotation Backend

Language documentation is inherently a time-intensive process; transcrip...
research
04/03/2023

ScandEval: A Benchmark for Scandinavian Natural Language Processing

This paper introduces a Scandinavian benchmarking platform, ScandEval, w...
research
07/11/2020

Is Machine Learning Speaking my Language? A Critical Look at the NLP-Pipeline Across 8 Human Languages

Natural Language Processing (NLP) is increasingly used as a key ingredie...
research
10/23/2020

Natural Language Processing Chains Inside a Cross-lingual Event-Centric Knowledge Pipeline for European Union Under-resourced Languages

This article presents the strategy for developing a platform containing ...
research
11/08/2021

JaMIE: A Pipeline Japanese Medical Information Extraction System

We present an open-access natural language processing toolkit for Japane...
research
07/17/2022

Natural language processing for clusterization of genes according to their functions

There are hundreds of methods for analysis of data obtained in mRNA-sequ...

Please sign up or login with your details

Forgot password? Click here to reset