Cognitive Coding of Speech

10/08/2021
by   Reza Lotfidereshgi, et al.
0

We propose an approach for cognitive coding of speech by unsupervised extraction of contextual representations in two hierarchical levels of abstraction. Speech attributes such as phoneme identity that last one hundred milliseconds or less are captured in the lower level of abstraction, while speech attributes such as speaker identity and emotion that persist up to one second are captured in the higher level of abstraction. This decomposition is achieved by a two-stage neural network, with a lower and an upper stage operating at different time scales. Both stages are trained to predict the content of the signal in their respective latent spaces. A top-down pathway between stages further improves the predictive capability of the network. With an application in speech compression in mind, we investigate the effect of dimensionality reduction and low bitrate quantization on the extracted representations. The performance measured on the LibriSpeech and EmoV-DB datasets reaches, and for some speech attributes even exceeds, that of state-of-the-art approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/19/2020

Multi-stage Speaker Extraction with Utterance and Frame-Level Reference Signals

Speaker extraction uses a pre-recorded reference speech as the reference...
research
10/11/2018

A Novel Chaotic Uniform Quantizer for Speech Coding

Quantization is an essential step in the analog-to-digital conversion pr...
research
12/14/2022

Disentangling Prosody Representations with Unsupervised Speech Reconstruction

Human speech can be characterized by different components, including sem...
research
11/15/2022

Is Style All You Need? Dependencies Between Emotion and GST-based Speaker Recognition

In this work, we study the hypothesis that speaker identity embeddings e...
research
03/08/2022

SpeechFormer: A Hierarchical Efficient Framework Incorporating the Characteristics of Speech

Transformer has obtained promising results on cognitive speech signal pr...
research
03/14/2023

Two-stage Neural Network for ICASSP 2023 Speech Signal Improvement Challenge

In ICASSP 2023 speech signal improvement challenge, we developed a dual-...
research
01/07/2018

Perceptual Context in Cognitive Hierarchies

Cognition does not only depend on bottom-up sensor feature abstraction, ...

Please sign up or login with your details

Forgot password? Click here to reset