A Correspondence Variational Autoencoder for Unsupervised Acoustic Word Embeddings

12/03/2020
by   Puyuan Peng, et al.
0

We propose a new unsupervised model for mapping a variable-duration speech segment to a fixed-dimensional representation. The resulting acoustic word embeddings can form the basis of search, discovery, and indexing systems for low- and zero-resource languages. Our model, which we refer to as a maximal sampling correspondence variational autoencoder (MCVAE), is a recurrent neural network (RNN) trained with a novel self-supervised correspondence loss that encourages consistency between embeddings of different instances of the same word. Our training scheme improves on previous correspondence training approaches through the use and comparison of multiple samples from the approximate posterior distribution. In the zero-resource setting, the MCVAE can be trained in an unsupervised way, without any ground-truth word pairs, by using the word-like segments discovered via an unsupervised term discovery system. In both this setting and a semi-supervised low-resource setting (with a limited set of ground-truth word pairs), the MCVAE outperforms previous state-of-the-art models, such as Siamese-, CAE- and VAE-based RNNs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/01/2018

Truly unsupervised acoustic word embeddings using weak top-down constraints in encoder-decoder models

We investigate unsupervised models that can map a variable-duration spee...
research
06/02/2020

Improved acoustic word embeddings for zero-resource languages using multilingual transfer

Acoustic word embeddings are fixed-dimensional representations of variab...
research
03/19/2021

Acoustic word embeddings for zero-resource languages using self-supervised contrastive learning and multilingual adaptation

Acoustic word embeddings (AWEs) are fixed-dimensional representations of...
research
12/14/2020

A comparison of self-supervised speech representations as input features for unsupervised acoustic word embeddings

Many speech processing tasks involve measuring the acoustic similarity b...
research
03/09/2016

Unsupervised word segmentation and lexicon discovery using acoustic word embeddings

In settings where only unlabelled speech data is available, speech techn...
research
03/28/2020

Unsupervised feature learning for speech using correspondence and Siamese networks

In zero-resource settings where transcribed speech audio is unavailable,...
research
06/03/2023

Acoustic Word Embeddings for Untranscribed Target Languages with Continued Pretraining and Learned Pooling

Acoustic word embeddings are typically created by training a pooling fun...

Please sign up or login with your details

Forgot password? Click here to reset