The effectiveness of unsupervised subword modeling with autoregressive and cross-lingual phone-aware networks

12/17/2020
by   Siyuan Feng, et al.
8

This study addresses unsupervised subword modeling, i.e., learning acoustic feature representations that can distinguish between subword units of a language. We propose a two-stage learning framework that combines self-supervised learning and cross-lingual knowledge transfer. The framework consists of autoregressive predictive coding (APC) as the front-end and a cross-lingual deep neural network (DNN) as the back-end. Experiments on the ABX subword discriminability task conducted with the Libri-light and ZeroSpeech 2017 databases showed that our approach is competitive or superior to state-of-the-art studies. Comprehensive and systematic analyses at the phoneme- and articulatory feature (AF)-level showed that our approach was better at capturing diphthong than monophthong vowel information, while also differences in the amount of information captured for different types of consonants were observed. Moreover, a positive correlation was found between the effectiveness of the back-end in capturing a phoneme's information and the quality of the cross-lingual phone labels assigned to the phoneme. The AF-level analysis together with t-SNE visualization results showed that the proposed approach is better than MFCC and APC features in capturing manner and place of articulation information, vowel height, and backness information. Taken together, the analyses showed that the two stages in our approach are both effective in capturing phoneme and AF information. Nevertheless, monophthong vowel information is less well captured than consonant information, which suggests that future research should focus on improving capturing monophthong vowel information.

READ FULL TEXT

page 1

page 12

page 13

page 14

research
07/25/2020

Unsupervised Subword Modeling Using Autoregressive Pretraining and Cross-Lingual Phone-Aware Modeling

This study addresses unsupervised subword modeling, i.e., learning featu...
research
09/19/2018

Unsupervised cross-lingual matching of product classifications

Unsupervised cross-lingual embeddings mapping has provided a unique tool...
research
04/30/2020

A Call for More Rigor in Unsupervised Cross-lingual Learning

We review motivations, definition, approaches, and methodology for unsup...
research
05/09/2023

Exploration of Language Dependency for Japanese Self-Supervised Speech Representation Models

Self-supervised learning (SSL) has been dramatically successful not only...
research
04/28/2020

Self-Attention with Cross-Lingual Position Representation

Position encoding (PE), an essential part of self-attention networks (SA...
research
07/28/2020

Autosegmental Neural Nets: Should Phones and Tones be Synchronous or Asynchronous?

Phones, the segmental units of the International Phonetic Alphabet (IPA)...
research
05/12/2016

Polyglot Neural Language Models: A Case Study in Cross-Lingual Phonetic Representation Learning

We introduce polyglot language models, recurrent neural network models t...

Please sign up or login with your details

Forgot password? Click here to reset