Articulation GAN: Unsupervised modeling of articulatory learning

10/27/2022
by   Gašper Beguš, et al.
0

Generative deep neural networks are widely used for speech synthesis, but most existing models directly generate waveforms or spectral outputs. Humans, however, produce speech by controlling articulators, which results in the production of speech sounds through physical properties of sound propagation. We propose a new unsupervised generative model of speech production/synthesis that includes articulatory representations and thus more closely mimics human speech production. We introduce the Articulatory Generator to the Generative Adversarial Network paradigm. The Articulatory Generator needs to learn to generate articulatory representations (electromagnetic articulography or EMA) in a fully unsupervised manner without ever accessing EMA data. A separate pre-trained physical model (ema2wav) then transforms the generated EMA representations to speech waveforms, which get sent to the Discriminator for evaluation. Articulatory analysis of the generated EMA representations suggests that the network learns to control articulators in a manner that closely follows human articulators during speech production. Acoustic analysis of the outputs suggest that the network learns to generate words that are part of training data as well as novel innovative words that are absent from training data. Our proposed architecture thus allows modeling of articulatory learning with deep neural networks from raw audio inputs in a fully unsupervised manner. We additionally discuss implications of articulatory representations for cognitive models of human language and speech technology in general.

READ FULL TEXT
research
06/06/2020

Generative Adversarial Phonology: Modeling unsupervised phonetic and phonological learning with neural networks

Training deep neural networks on well-understood dependencies in speech ...
research
06/04/2020

CiwGAN and fiwGAN: Encoding information in acoustic data to model lexical learning with Generative Adversarial Networks

How can deep neural networks encode information that corresponds to word...
research
05/02/2023

Basic syntax from speech: Spontaneous concatenation in unsupervised deep neural networks

Computational models of syntax are predominantly text-based. Here we pro...
research
10/12/2021

The Mirrornet : Learning Audio Synthesizer Controls Inspired by Sensorimotor Interaction

Experiments to understand the sensorimotor neural interactions in the hu...
research
11/10/2020

Artificial sound change: Language change and deep convolutional neural networks in iterative learning

This paper proposes a framework for modeling sound change that combines ...
research
04/25/2017

Introspective Generative Modeling: Decide Discriminatively

We study unsupervised learning by developing introspective generative mo...

Please sign up or login with your details

Forgot password? Click here to reset