Vernacular Search Query Translation with Unsupervised Domain Adaptation

08/07/2022
by   Mandar Kulkarni, et al.
0

With the democratization of e-commerce platforms, an increasingly diversified user base is opting to shop online. To provide a comfortable and reliable shopping experience, it's important to enable users to interact with the platform in the language of their choice. An accurate query translation is essential for Cross-Lingual Information Retrieval (CLIR) with vernacular queries. Due to internet-scale operations, e-commerce platforms get millions of search queries every day. However, creating a parallel training set to train an in-domain translation model is cumbersome. This paper proposes an unsupervised domain adaptation approach to translate search queries without using any parallel corpus. We use an open-domain translation model (trained on public corpus) and adapt it to the query data using only the monolingual queries from two languages. In addition, fine-tuning with a small labeled set further improves the result. For demonstration, we show results for Hindi to English query translation and use mBART-large-50 model as the baseline to improve upon. Experimental results show that, without using any parallel corpus, we obtain more than 20 BLEU points improvement over the baseline while fine-tuning with a small 50k labeled set provides more than 27 BLEU points improvement over the baseline.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/22/2020

Unsupervised Domain Adaptation for Neural Machine Translation with Iterative Back Translation

State-of-the-art neural machine translation (NMT) systems are data-hungr...
research
08/07/2022

Study of Encoder-Decoder Architectures for Code-Mix Search Query Translation

With the broad reach of the internet and smartphones, e-commerce platfor...
research
11/01/2021

Unsupervised Domain Adaptation with Adapter

Unsupervised domain adaptation (UDA) with pre-trained language models (P...
research
03/01/2023

UDAPDR: Unsupervised Domain Adaptation via LLM Prompting and Distillation of Rerankers

Many information retrieval tasks require large labeled datasets for fine...
research
06/07/2019

Word-based Domain Adaptation for Neural Machine Translation

In this paper, we empirically investigate applying word-level weights to...
research
03/31/2020

Towards Productionizing Subjective Search Systems

Existing e-commerce search engines typically support search only over ob...
research
10/26/2020

Exploiting Neural Query Translation into Cross Lingual Information Retrieval

As a crucial role in cross-language information retrieval (CLIR), query ...

Please sign up or login with your details

Forgot password? Click here to reset