Synchronous Bidirectional Learning for Multilingual Lip Reading

by   Mingshuang Luo, et al.

Lip reading has received increasing attention in recent years. This paper focuses on the synergy of multilingual lip reading. There are more than 7,000 languages in the world, which implies that it is impractical to train separate lip reading models by collecting large-scale data per language. Although each language has its own linguistic and pronunciation features, the lip movements of all languages share similar patterns. Based on this idea, in this paper, we try to explore the synergized learning of multilingual lip reading, and further propose a synchronous bidirectional learning(SBL) framework for effective synergy of multilingual lip reading. Firstly, we introduce the phonemes as our modeling units for the multilingual setting. Similar phoneme always leads to similar visual patterns. The multilingual setting would increase both the quantity and the diversity of each phoneme shared among different languages. So the learning for the multilingual target should bring improvement to the prediction of phonemes. Then, a SBL block is proposed to infer the target unit when given its previous and later context. The rules for each specific language which the model itself judges to be is learned in this fill-in-the-blank manner. To make the learning process more targeted at each particular language, we introduce an extra task of predicting the language identity in the learning process. Finally, we perform a thorough comparison on LRW (English) and LRW-1000(Mandarin). The results outperform the existing state of the art by a large margin, and show the promising benefits from the synergized learning of different languages.


Multilingual Speech-to-Speech Translation into Multiple Target Languages

Speech-to-speech translation (S2ST) enables spoken communication between...

Deformation Flow Based Two-Stream Network for Lip Reading

Lip reading is the task of recognizing the speech content by analyzing m...

NU HLT at CMCL 2022 Shared Task: Multilingual and Crosslingual Prediction of Human Reading Behavior in Universal Language Space

In this paper, we present a unified model that works for both multilingu...

BigScience: A Case Study in the Social Construction of a Multilingual Large Language Model

The BigScience Workshop was a value-driven initiative that spanned one a...

Inference of Partial Colexifications from Multilingual Wordlists

The past years have seen a drastic rise in studies devoted to the invest...

MRN: Multiplexed Routing Network for Incremental Multilingual Text Recognition

Traditional Multilingual Text Recognition (MLTR) usually targets a fixed...

Testing the Predictions of Surprisal Theory in 11 Languages

A fundamental result in psycholinguistics is that less predictable words...

Please sign up or login with your details

Forgot password? Click here to reset