Exploiting Cross-domain And Cross-Lingual Ultrasound Tongue Imaging Features For Elderly And Dysarthric Speech Recognition

06/15/2022
by   Shujie Hu, et al.
0

Articulatory features are inherently invariant to acoustic signal distortion and have been successfully incorporated into automatic speech recognition (ASR) systems designed for normal speech. Their practical application to atypical task domains such as elderly and disordered speech across languages is often limited by the difficulty in collecting such specialist data from target speakers. This paper presents a cross-domain and cross-lingual A2A inversion approach that utilizes the parallel audio, visual and ultrasound tongue imaging (UTI) data of the 24-hour TaL corpus in A2A model pre-training before being cross-domain and cross-lingual adapted to three datasets across two languages: the English DementiaBank Pitt and Cantonese JCCOCC MoCA elderly speech corpora; and the English TORGO dysarthric speech data, to produce UTI based articulatory features. Experiments conducted on three tasks suggested incorporating the generated articulatory features consistently outperformed the baseline hybrid TDNN and Conformer based end-to-end systems constructed using acoustic features only by statistically significant word error rate or character error rate reductions up to 2.64 relative) after data augmentation and speaker adaptation were applied.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/19/2022

Exploiting Cross Domain Acoustic-to-articulatory Inverted Features For Disordered Speech Recognition

Articulatory features are inherently invariant to acoustic signal distor...
research
01/15/2022

Recent Progress in the CUHK Dysarthric Speech Recognition System

Despite the rapid progress of automatic speech recognition (ASR) technol...
research
07/28/2020

Autosegmental Neural Nets: Should Phones and Tones be Synchronous or Asynchronous?

Phones, the segmental units of the International Phonetic Alphabet (IPA)...
research
02/28/2023

Exploring Self-supervised Pre-trained ASR Models For Dysarthric and Elderly Speech Recognition

Automatic recognition of disordered and elderly speech remains a highly ...
research
05/15/2018

Improved ASR for Under-Resourced Languages Through Multi-Task Learning with Acoustic Landmarks

Furui first demonstrated that the identity of both consonant and vowel c...
research
06/23/2022

Two-pass Decoding and Cross-adaptation Based System Combination of End-to-end Conformer and Hybrid TDNN ASR Systems

Fundamental modelling differences between hybrid and end-to-end (E2E) au...
research
12/21/2020

Unsupervised Cross-Lingual Speech Emotion Recognition Using DomainAdversarial Neural Network

By using deep learning approaches, Speech Emotion Recog-nition (SER) on ...

Please sign up or login with your details

Forgot password? Click here to reset