Layer-wise Fast Adaptation for End-to-End Multi-Accent Speech Recognition

04/21/2022
by   Xun Gong, et al.
0

Accent variability has posed a huge challenge to automatic speech recognition (ASR) modeling. Although one-hot accent vector based adaptation systems are commonly used, they require prior knowledge about the target accent and cannot handle unseen accents. Furthermore, simply concatenating accent embeddings does not make good use of accent knowledge, which has limited improvements. In this work, we aim to tackle these problems with a novel layer-wise adaptation structure injected into the E2E ASR model encoder. The adapter layer encodes an arbitrary accent in the accent space and assists the ASR model in recognizing accented speech. Given an utterance, the adaptation structure extracts the corresponding accent information and transforms the input acoustic feature into an accent-related feature through the linear combination of all accent bases. We further explore the injection position of the adaptation layer, the number of accent bases, and different types of accent bases to achieve better accent adaptation. Experimental results show that the proposed adaptation structure brings 12% and 10% relative word error rate (WER) reduction on the AESRC2020 accent dataset and the Librispeech dataset, respectively, compared to the baseline.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/01/2022

A Conformer Based Acoustic Model for Robust Automatic Speech Recognition

This study addresses robust automatic speech recognition (ASR) by introd...
research
02/24/2017

Residual Convolutional CTC Networks for Automatic Speech Recognition

Deep learning approaches have been widely used in Automatic Speech Recog...
research
11/05/2020

Multi-Accent Adaptation based on Gate Mechanism

When only a limited amount of accented speech data is available, to prom...
research
05/25/2023

Unified Modeling of Multi-Talker Overlapped Speech Recognition and Diarization with a Sidecar Separator

Multi-talker overlapped speech poses a significant challenge for speech ...
research
10/06/2021

Internal Language Model Adaptation with Text-Only Data for End-to-End Speech Recognition

Text-only adaptation of an end-to-end (E2E) model remains a challenging ...
research
03/27/2022

Listen, Adapt, Better WER: Source-free Single-utterance Test-time Adaptation for Automatic Speech Recognition

Although deep learning-based end-to-end Automatic Speech Recognition (AS...
research
02/16/2018

Interpreting DNN output layer activations: A strategy to cope with unseen data in speech recognition

Unseen data can degrade performance of deep neural net acoustic models. ...

Please sign up or login with your details

Forgot password? Click here to reset