Vision as an Interlingua: Learning Multilingual Semantic Embeddings of Untranscribed Speech

04/09/2018
by   David Harwath, et al.
0

In this paper, we explore the learning of neural network embeddings for natural images and speech waveforms describing the content of those images. These embeddings are learned directly from the waveforms without the use of linguistic transcriptions or conventional speech recognition technology. While prior work has investigated this setting in the monolingual case using English speech data, this work represents the first effort to apply these techniques to languages beyond English. Using spoken captions collected in English and Hindi, we show that the same model architecture can be successfully applied to both languages. Further, we demonstrate that training a multilingual model simultaneously on both languages offers improved performance over the monolingual models. Finally, we show that these models are capable of performing semantic cross-lingual speech-to-speech retrieval.

READ FULL TEXT
research
06/20/2022

Multilingual HateCheck: Functional Tests for Multilingual Hate Speech Detection Models

Hate speech detection models are typically evaluated on held-out test se...
research
11/08/2021

Cascaded Multilingual Audio-Visual Learning from Videos

In this paper, we explore self-supervised audio-visual models that learn...
research
11/02/2022

M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for Multilingual Speech to Image Retrieval

This work investigates the use of large-scale, pre-trained models (CLIP ...
research
07/07/2022

Investigating the Impact of Cross-lingual Acoustic-Phonetic Similarities on Multilingual Speech Recognition

Multilingual automatic speech recognition (ASR) systems mostly benefit l...
research
02/08/2019

Models of Visually Grounded Speech Signal Pay Attention To Nouns: a Bilingual Experiment on English and Japanese

We investigate the behaviour of attention in neural models of visually g...
research
11/23/2020

An Online Multilingual Hate speech Recognition System

The exponential increase in the use of the Internet and social media ove...
research
04/30/2021

Paraphrastic Representations at Scale

We present a system that allows users to train their own state-of-the-ar...

Please sign up or login with your details

Forgot password? Click here to reset