Improved acoustic-to-articulatory inversion using representations from pretrained self-supervised learning models

10/30/2022
by   Sathvik Udupa, et al.
0

In this work, we investigate the effectiveness of pretrained Self-Supervised Learning (SSL) features for learning the mapping for acoustic to articulatory inversion (AAI). Signal processing-based acoustic features such as MFCCs have been predominantly used for the AAI task with deep neural networks. With SSL features working well for various other speech tasks such as speech recognition, emotion classification, etc., we experiment with its efficacy for AAI. We train on SSL features with transformer neural networks-based AAI models of 3 different model complexities and compare its performance with MFCCs in subject-specific (SS), pooled and fine-tuned (FT) configurations with data from 10 subjects, and evaluate with correlation coefficient (CC) score on the unseen sentence test set. We find that acoustic feature reconstruction objective-based SSL features such as TERA and DeCoAR work well for AAI, with SS CCs of these SSL features reaching close to the best FT CCs of MFCC. We also find the results consistent across different model sizes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/03/2023

Acoustic-to-articulatory inversion for dysarthric speech: Are pre-trained self-supervised representations favorable?

Acoustic-to-articulatory inversion (AAI) involves mapping from the acous...
research
09/17/2023

Improving Speech Inversion Through Self-Supervised Embeddings and Enhanced Tract Variables

The performance of deep learning models depends significantly on their c...
research
12/28/2020

Lattice-Free MMI Adaptation Of Self-Supervised Pretrained Acoustic Models

In this work, we propose lattice-free MMI (LFMMI) for supervised adaptat...
research
10/29/2022

The Secret Source : Incorporating Source Features to Improve Acoustic-to-Articulatory Speech Inversion

In this work, we incorporated acoustically derived source features, aper...
research
05/14/2023

Self-supervised Neural Factor Analysis for Disentangling Utterance-level Speech Representations

Self-supervised learning (SSL) speech models such as wav2vec and HuBERT ...
research
04/05/2022

Repeat after me: Self-supervised learning of acoustic-to-articulatory mapping by vocal imitation

We propose a computational model of speech production combining a pre-tr...
research
03/11/2022

Acoustic To Articulatory Speech Inversion Using Multi-Resolution Spectro-Temporal Representations Of Speech Signals

Multi-resolution spectro-temporal features of a speech signal represent ...

Please sign up or login with your details

Forgot password? Click here to reset