Interpreting intermediate convolutional layers of CNNs trained on raw speech

04/19/2021
by   Gašper Beguš, et al.
11

This paper presents a technique to interpret and visualize intermediate layers in CNNs trained on raw speech data in an unsupervised manner. We show that averaging over feature maps after ReLU activation in each convolutional layer yields interpretable time-series data. The proposed technique enables acoustic analysis of intermediate convolutional layers. To uncover how meaningful representation in speech gets encoded in intermediate layers of CNNs, we manipulate individual latent variables to marginal levels outside of the training range. We train and probe internal representations on two models – a bare WaveGAN architecture and a ciwGAN extension which forces the Generator to output informative data and results in emergence of linguistically meaningful representations. Interpretation and visualization is performed for three basic acoustic properties of speech: periodic vibration (corresponding to vowels), aperiodic noise vibration (corresponding to fricatives), and silence (corresponding to stops). We also argue that the proposed technique allows acoustic analysis of intermediate layers that parallels the acoustic analysis of human speech data: we can extract F0, intensity, duration, formants, and other acoustic properties from intermediate layers in order to test where and how CNNs encode various types of information. The models are trained on two speech processes with different degrees of complexity: a simple presence of [s] and a computationally complex presence of reduplication (copied material). Observing the causal effect between interpolation and the resulting changes in intermediate layers can reveal how individual variables get transformed into spikes in activation in intermediate layers. Using the proposed technique, we can analyze how linguistically meaningful units in speech get encoded in different convolutional layers.

READ FULL TEXT

page 1

page 5

page 9

page 10

research
10/05/2021

Interpreting intermediate convolutional layers in unsupervised acoustic word classification

Understanding how deep convolutional neural networks classify data has b...
research
10/29/2020

Interpretable Representation Learning for Speech and Audio Signals Based on Relevance Weighting

The learning of interpretable representations from raw data presents sig...
research
02/23/2018

Do WaveNets Dream of Acoustic Waves?

Various sources have reported the WaveNet deep learning architecture bei...
research
02/21/2019

Towards Visually Grounded Sub-Word Speech Unit Discovery

In this paper, we investigate the manner in which interpretable sub-word...
research
09/13/2020

Identity-Based Patterns in Deep Convolutional Networks: Generative Adversarial Phonology and Reduplication

Identity-based patterns for which a computational model needs to output ...
research
03/20/2023

Approaching an unknown communication system by latent space exploration and causal inference

This paper proposes a methodology for discovering meaningful properties ...

Please sign up or login with your details

Forgot password? Click here to reset