Modeling speech recognition and synthesis simultaneously: Encoding and decoding lexical and sublexical semantic information into speech with no direct access to speech data

03/22/2022
by   Gašper Beguš, et al.
2

Human speakers encode information into raw speech which is then decoded by the listeners. This complex relationship between encoding (production) and decoding (perception) is often modeled separately. Here, we test how encoding and decoding of lexical semantic information can emerge automatically from raw speech in unsupervised generative deep convolutional networks that combine the production and perception principles of speech. We introduce, to our knowledge, the most challenging objective in unsupervised lexical learning: a network that must learn unique representations for lexical items with no direct access to training data. We train several models (ciwGAN and fiwGAN arXiv:2006.02951) and test how the networks classify acoustic lexical items in unobserved test data. Strong evidence in favor of lexical learning and a causal relationship between latent codes and meaningful sublexical units emerge. The architecture that combines the production and perception principles is thus able to learn to decode unique information from raw acoustic data without accessing real training data directly. We propose a technique to explore lexical (holistic) and sublexical (featural) learned representations in the classifier network. The results bear implications for unsupervised speech technology, as well as for unsupervised semantic modeling as language models increasingly bypass text and operate from raw acoustics.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/04/2020

CiwGAN and fiwGAN: Encoding information in acoustic data to model lexical learning with Generative Adversarial Networks

How can deep neural networks encode information that corresponds to word...
research
10/05/2021

Interpreting intermediate convolutional layers in unsupervised acoustic word classification

Understanding how deep convolutional neural networks classify data has b...
research
11/30/2016

Deep encoding of etymological information in TEI

This paper aims to provide a comprehensive modeling and representation o...
research
09/13/2020

Identity-Based Patterns in Deep Convolutional Networks: Generative Adversarial Phonology and Reduplication

Identity-based patterns for which a computational model needs to output ...
research
05/02/2023

Basic syntax from speech: Spontaneous concatenation in unsupervised deep neural networks

Computational models of syntax are predominantly text-based. Here we pro...
research
04/19/2021

Interpreting intermediate convolutional layers of CNNs trained on raw speech

This paper presents a technique to interpret and visualize intermediate ...

Please sign up or login with your details

Forgot password? Click here to reset