GlobalWoZ: Globalizing MultiWoZ to Develop Multilingual Task-Oriented Dialogue Systems

10/14/2021
by   Bosheng Ding, et al.
8

Much recent progress in task-oriented dialogue (ToD) systems has been driven by available annotation data across multiple domains for training. Over the last few years, there has been a move towards data curation for multilingual ToD systems that are applicable to serve people speaking different languages. However, existing multilingual ToD datasets either have a limited coverage of languages due to the high cost of data curation, or ignore the fact that dialogue entities barely exist in countries speaking these languages. To tackle these limitations, we introduce a novel data curation method that generates GlobalWoZ – a large-scale multilingual ToD dataset globalized from an English ToD dataset for three unexplored use cases. Our method is based on translating dialogue templates and filling them with local entities in the target-language countries. We release our dataset as well as a set of strong baselines to encourage research on learning multilingual ToD systems for real use cases.

READ FULL TEXT
research
04/17/2021

Crossing the Conversational Chasm: A Primer on Multilingual Task-Oriented Dialogue Systems

Despite the fact that natural language conversations with machines repre...
research
08/27/2022

MDIA: A Benchmark for Multilingual Dialogue Generation in 46 Languages

Owing to the lack of corpora for low-resource languages, current works o...
research
12/15/2021

AllWOZ: Towards Multilingual Task-Oriented Dialog Systems for All

A commonly observed problem of the state-of-the-art natural language tec...
research
04/28/2022

EVI: Multilingual Spoken Dialogue Tasks and Dataset for Knowledge-Based Enrolment, Verification, and Identification

Knowledge-based authentication is crucial for task-oriented spoken dialo...
research
06/09/2023

I run as fast as a rabbit, can you? A Multilingual Simile Dialogue Dataset

A simile is a figure of speech that compares two different things (calle...
research
06/09/2018

GHTraffic: A Dataset for Reproducible Research in Service-Oriented Computing

We present GHTraffic, a dataset of significant size comprising HTTP tran...
research
07/26/2023

Multi3WOZ: A Multilingual, Multi-Domain, Multi-Parallel Dataset for Training and Evaluating Culturally Adapted Task-Oriented Dialog Systems

Creating high-quality annotated data for task-oriented dialog (ToD) is k...

Please sign up or login with your details

Forgot password? Click here to reset