On Investigation of Unsupervised Speech Factorization Based on Normalization Flow

10/29/2019
by   Haoran Sun, et al.
0

Speech signals are complex composites of various information, including phonetic content, speaker traits, channel effect, etc. Decomposing this complicated mixture into independent factors, i.e., speech factorization, is fundamentally important and plays the central role in many important algorithms of modern speech processing tasks. In this paper, we present a preliminary investigation on unsupervised speech factorization based on the normalization flow model. This model constructs a complex invertible transform, by which we can project speech segments into a latent code space where the distribution is a simple diagonal Gaussian. Our preliminary investigation on the TIMIT database shows that this code space exhibits favorable properties such as denseness and pseudo linearity, and perceptually important factors such as phonetic content and speaker trait can be represented as particular directions within the code space.

READ FULL TEXT

page 3

page 4

research
10/27/2020

Deep generative factorization for speech signal

Various information factors are blended in speech signals, which forms t...
research
02/27/2018

Deep factorization for speech signal

Various informative factors mixed in speech signals, leading to great di...
research
10/30/2019

Mixture factorized auto-encoder for unsupervised hierarchical deep factorization of speech signal

Speech signal is constituted and contributed by various informative fact...
research
03/31/2016

Multi-task Recurrent Model for Speech and Speaker Recognition

Although highly correlated, speech and speaker recognition have been reg...
research
11/30/2020

Look who's not talking

The objective of this work is speaker diarisation of speech recordings '...
research
09/25/2019

Disentangling Speech and Non-Speech Components for Building Robust Acoustic Models from Found Data

In order to build language technologies for majority of the languages, i...
research
06/03/2023

SpeechGen: Unlocking the Generative Power of Speech Language Models with Prompts

Large language models (LLMs) have gained considerable attention for Arti...

Please sign up or login with your details

Forgot password? Click here to reset