Speaker adaptation for Wav2vec2 based dysarthric ASR

04/02/2022
by   Murali Karthick Baskar, et al.
0

Dysarthric speech recognition has posed major challenges due to lack of training data and heavy mismatch in speaker characteristics. Recent ASR systems have benefited from readily available pretrained models such as wav2vec2 to improve the recognition performance. Speaker adaptation using fMLLR and xvectors have provided major gains for dysarthric speech with very little adaptation data. However, integration of wav2vec2 with fMLLR features or xvectors during wav2vec2 finetuning is yet to be explored. In this work, we propose a simple adaptation network for fine-tuning wav2vec2 using fMLLR features. The adaptation network is also flexible to handle other speaker adaptive features such as xvectors. Experimental analysis show steady improvements using our proposed approach across all impairment severity levels and attains 57.72% WER for high severity in UASpeech dataset. We also performed experiments on German dataset to substantiate the consistency of our proposed approach across diverse domains.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/18/2023

Use of Speech Impairment Severity for Dysarthric Speech Recognition

A key challenge in dysarthric speech recognition is the speaker-level di...
research
02/21/2022

Speaker Adaptation Using Spectro-Temporal Deep Features for Dysarthric and Elderly Speech Recognition

Despite the rapid progress of automatic speech recognition (ASR) technol...
research
10/16/2021

A Unified Speaker Adaptation Approach for ASR

Transformer models have been used in automatic speech recognition (ASR) ...
research
09/14/2021

Residual Adapters for Parameter-Efficient ASR Adaptation to Atypical and Accented Speech

Automatic Speech Recognition (ASR) systems are often optimized to work b...
research
06/26/2023

Factorised Speaker-environment Adaptive Training of Conformer Speech Recognition Systems

Rich sources of variability in natural speech present significant challe...
research
05/03/2021

Quantifying and Maximizing the Benefits of Back-End Noise Adaption on Attention-Based Speech Recognition Models

This work analyzes how attention-based Bidirectional Long Short-Term Mem...
research
03/05/2020

Statistical Context-Dependent Units Boundary Correction for Corpus-based Unit-Selection Text-to-Speech

In this study, we present an innovative technique for speaker adaptation...

Please sign up or login with your details

Forgot password? Click here to reset