Towards a Common Speech Analysis Engine

03/01/2022
by   Hagai Aronowitz, et al.
0

Recent innovations in self-supervised representation learning have led to remarkable advances in natural language processing. That said, in the speech processing domain, self-supervised representation learning-based systems are not yet considered state-of-the-art. We propose leveraging recent advances in self-supervised-based speech processing to create a common speech analysis engine. Such an engine should be able to handle multiple speech processing tasks, using a single architecture, to obtain state-of-the-art accuracy. The engine must also enable support for new tasks with small training datasets. Beyond that, a common engine should be capable of supporting distributed training with client in-house private data. We present the architecture for a common speech analysis engine based on the HuBERT self-supervised speech representation. Based on experiments, we report our results for language identification and emotion recognition on the standard evaluations NIST-LRE 07 and IEMOCAP. Our results surpass the state-of-the-art performance reported so far on these tasks. We also analyzed our engine on the emotion recognition task using reduced amounts of training data and show how to achieve improved results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/07/2022

Speech Emotion Recognition using Self-Supervised Features

Self-supervised pre-trained features have consistently delivered state-o...
research
06/07/2023

Label Aware Speech Representation Learning For Language Identification

Speech representation learning approaches for non-semantic tasks such as...
research
05/05/2023

A vector quantized masked autoencoder for audiovisual speech emotion recognition

While fully-supervised models have been shown to be effective for audiov...
research
03/30/2022

Improving Distortion Robustness of Self-supervised Speech Processing Tasks with Domain Adaptation

Speech distortions are a long-standing problem that degrades the perform...
research
03/01/2022

TRILLsson: Distilled Universal Paralinguistic Speech Representations

Recent advances in self-supervision have dramatically improved the quali...
research
10/09/2021

Universal Paralinguistic Speech Representations Using Self-Supervised Conformers

Many speech applications require understanding aspects beyond the words ...
research
04/21/2023

A vector quantized masked autoencoder for speech emotion recognition

Recent years have seen remarkable progress in speech emotion recognition...

Please sign up or login with your details

Forgot password? Click here to reset