Reducing language context confusion for end-to-end code-switching automatic speech recognition

01/28/2022
by   Shuai Zhang, et al.
0

Code-switching is about dealing with alternative languages in the communication process. Training end-to-end (E2E) automatic speech recognition (ASR) systems for code-switching is known to be a challenging problem because of the lack of data compounded by the increased language context confusion due to the presence of more than one language. In this paper, we propose a language-related attention mechanism to reduce multilingual context confusion for the E2E code-switching ASR model based on the Equivalence Constraint Theory (EC). The linguistic theory requires that any monolingual fragment that occurs in the code-switching sentence must occur in one of the monolingual sentences. It establishes a bridge between monolingual data and code-switching data. By calculating the respective attention of multiple languages, our method can efficiently transfer language knowledge from rich monolingual data. We evaluate our method on ASRU 2019 Mandarin-English code-switching challenge dataset. Compared with the baseline model, the proposed method achieves 11.37 mix error rate reduction.

READ FULL TEXT
research
04/08/2019

Constrained Output Embeddings for End-to-End Code-Switching Speech Recognition with Only Monolingual Data

The lack of code-switch training data is one of the major concerns in th...
research
10/30/2018

Towards End-to-end Automatic Code-Switching Speech Recognition

Speech recognition in mixed language has difficulties to adapt end-to-en...
research
10/28/2020

Decoupling Pronunciation and Language for End-to-end Code-switching Automatic Speech Recognition

Despite the recent significant advances witnessed in end-to-end (E2E) AS...
research
11/30/2020

Transformer-Transducers for Code-Switched Speech Recognition

We live in a world where 60 languages fluently. Members of these communi...
research
10/28/2018

Language Modeling for Code-Switching: Evaluation, Integration of Monolingual Data, and Discriminative Training

We focus on the problem of language modeling for code-switched language,...
research
05/31/2021

Towards One Model to Rule All: Multilingual Strategy for Dialectal Code-Switching Arabic ASR

With the advent of globalization, there is an increasing demand for mult...
research
09/18/2019

Code-Switched Language Models Using Neural Based Synthetic Data from Parallel Sentences

Training code-switched language models is difficult due to lack of data ...

Please sign up or login with your details

Forgot password? Click here to reset