LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers

11/05/2022
by   Peidong Wang, et al.
0

End-to-end formulation of automatic speech recognition (ASR) and speech translation (ST) makes it easy to use a single model for both multilingual ASR and many-to-many ST. In this paper, we propose streaming language-agnostic multilingual speech recognition and translation using neural transducers (LAMASSU). To enable multilingual text generation in LAMASSU, we conduct a systematic comparison between specified and unified prediction and joint networks. We leverage a language-agnostic multilingual encoder that substantially outperforms shared encoders. To enhance LAMASSU, we propose to feed target LID to encoders. We also apply connectionist temporal classification regularization to transducer training. Experimental results show that LAMASSU not only drastically reduces the model size but also outperforms monolingual ASR and bilingual ST models.

READ FULL TEXT
research
09/13/2022

Learning ASR pathways: A sparse multilingual ASR model

Neural network pruning can be effectively applied to compress automatic ...
research
08/29/2022

A Language Agnostic Multilingual Streaming On-Device ASR System

On-device end-to-end (E2E) models have shown improvements over a convent...
research
09/13/2022

Streaming End-to-End Multilingual Speech Recognition with Joint Language Identification

Language identification is critical for many downstream tasks in automat...
research
07/07/2023

Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments

In real-world applications, users often require both translations and tr...
research
12/16/2022

BLASER: A Text-Free Speech-to-Speech Translation Evaluation Metric

End-to-End speech-to-speech translation (S2ST) is generally evaluated wi...
research
11/11/2022

Align, Write, Re-order: Explainable End-to-End Speech Translation via Operation Sequence Generation

The black-box nature of end-to-end speech translation (E2E ST) systems m...
research
05/16/2023

Application-Agnostic Language Modeling for On-Device ASR

On-device automatic speech recognition systems face several challenges c...

Please sign up or login with your details

Forgot password? Click here to reset