Synthetic Cross-accent Data Augmentation for Automatic Speech Recognition

03/01/2023
by   Philipp Klumpp, et al.
0

The awareness for biased ASR datasets or models has increased notably in recent years. Even for English, despite a vast amount of available training data, systems perform worse for non-native speakers. In this work, we improve an accent-conversion model (ACM) which transforms native US-English speech into accented pronunciation. We include phonetic knowledge in the ACM training to provide accurate feedback about how well certain pronunciation patterns were recovered in the synthesized waveform. Furthermore, we investigate the feasibility of learned accent representations instead of static embeddings. Generated data was then used to train two state-of-the-art ASR systems. We evaluated our approach on native and non-native English datasets and found that synthetically accented data helped the ASR to better understand speech from seen accents. This observation did not translate to unseen accents, and it was not observed for a model that had been pre-trained exclusively with native speech.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/16/2020

AccentDB: A Database of Non-Native English Accents to Assist Neural Speech Recognition

Modern Automatic Speech Recognition (ASR) technology has evolved to iden...
research
06/29/2023

Automatic Speech Recognition of Non-Native Child Speech for Language Learning Applications

Voicebots have provided a new avenue for supporting the development of l...
research
12/22/2022

Pushing the performances of ASR models on English and Spanish accents

Speech to text models tend to be trained and evaluated against a single ...
research
12/14/2020

REDAT: Accent-Invariant Representation for End-to-End ASR by Domain Adversarial Training with Relabeling

Accents mismatching is a critical problem for end-to-end ASR. This paper...
research
10/19/2021

AequeVox: Automated Fairness Testing of Speech Recognition Systems

Automatic Speech Recognition (ASR) systems have become ubiquitous. They ...
research
11/25/2020

Neural Representations for Modeling Variation in English Speech

Variation in speech is often represented and investigated using phonetic...
research
02/10/2022

Improving Automatic Speech Recognition for Non-Native English with Transfer Learning and Language Model Decoding

ASR systems designed for native English (L1) usually underperform on non...

Please sign up or login with your details

Forgot password? Click here to reset