Cross-lingual Transfer Learning for Multilingual Task Oriented Dialog

10/31/2018
by   Sebastian Schuster, et al.
0

One of the first steps in the utterance interpretation pipeline of many task-oriented conversational AI systems is to identify user intents and the corresponding slots. Neural sequence labeling models have achieved very high accuracy on these tasks when trained on large amounts of training data. However, collecting this data is very time-consuming and therefore it is unfeasible to collect large amounts of data for many languages. For this reason, it is desirable to make use of existing data in a high-resource language to train models in low-resource languages. In this paper, we investigate the performance of three different methods for cross-lingual transfer learning, namely (1) translating the training data, (2) using cross-lingual pre-trained embeddings, and (3) a novel method of using a multilingual machine translation encoder as contextual word representations. We find that given several hundred training examples in the the target language, the latter two methods outperform translating the training data. Further, in very low-resource settings, we find that multilingual contextual word representations give better results than using cross-lingual static embeddings. We release a dataset of around 57k annotated utterances in English (43k), Spanish (8.6k) and Thai (5k) for three task oriented domains at https://fb.me/multilingual_task_oriented_data.

READ FULL TEXT
research
01/29/2023

Improving Cross-lingual Information Retrieval on Low-Resource Languages via Optimal Transport Distillation

Benefiting from transformer-based pre-trained language models, neural ra...
research
10/02/2020

Automatic Extraction of Rules Governing Morphological Agreement

Creating a descriptive grammar of a language is an indispensable step fo...
research
11/27/2018

Cross-Lingual Approaches to Reference Resolution in Dialogue Systems

In the slot-filling paradigm, where a user can refer back to slots in th...
research
06/05/2021

BiToD: A Bilingual Multi-Domain Dataset For Task-Oriented Dialogue Modeling

Task-oriented dialogue (ToD) benchmarks provide an important avenue to m...
research
05/12/2021

Multilingual Offensive Language Identification for Low-resource Languages

Offensive content is pervasive in social media and a reason for concern ...
research
05/19/2020

Cross-lingual Transfer Learning for Dialogue Act Recognition

This paper deals with cross-lingual transfer learning for dialogue act (...
research
10/11/2020

Multilingual Offensive Language Identification with Cross-lingual Embeddings

Offensive content is pervasive in social media and a reason for concern ...

Please sign up or login with your details

Forgot password? Click here to reset