MT-Adapted Datasheets for Datasets: Template and Repository

05/27/2020
by   Marta R. Costa-Jussà, et al.
0

In this report we are taking the standardized model proposed by Gebru et al. (2018) for documenting the popular machine translation datasets of the EuroParl (Koehn, 2005) and News-Commentary (Barrault et al., 2019). Within this documentation process, we have adapted the original datasheet to the particular case of data consumers within the Machine Translation area. We are also proposing a repository for collecting the adapted datasheets in this research area

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/28/2022

Considerations for meaningful sign language machine translation based on glosses

Automatic sign language processing is gaining popularity in Natural Lang...
research
12/14/2021

Building on Huang et al. GlossBERT for Word Sense Disambiguation

We propose to take on the problem ofWord Sense Disambiguation (WSD). In ...
research
09/09/2019

Combining SMT and NMT Back-Translated Data for Efficient NMT

Neural Machine Translation (NMT) models achieve their best performance w...
research
03/31/2020

On the Integration of LinguisticFeatures into Statistical and Neural Machine Translation

New machine translations (MT) technologies are emerging rapidly and with...
research
04/26/2022

Efficient Machine Translation Domain Adaptation

Machine translation models struggle when translating out-of-domain text,...
research
09/15/2021

Miðeind's WMT 2021 submission

We present Miðeind's submission for the English→Icelandic and Icelandic→...
research
05/19/2023

Algorithmic failure as a humanities methodology: machine learning's mispredictions identify rich cases for qualitative analysis

This commentary tests a methodology proposed by Munk et al. (2022) for u...

Please sign up or login with your details

Forgot password? Click here to reset