Recent breakthroughs in zero-shot voice synthesis have enabled imitating...
Recent end-to-end automatic speech recognition (ASR) systems often utili...
The integration of Language Models (LMs) has proven to be an effective w...
Joint speech-language training is challenging due to the large demand fo...
Code-switching speech refers to a means of expression by mixing two or m...
We propose gated language experts to improve multilingual transformer
tr...
We introduce a language modeling approach for text to speech synthesis (...
The massive growth of self-supervised learning (SSL) has been witnessed ...
Although speech is a simple and effective way for humans to communicate ...
There is a surge in interest in self-supervised learning approaches for
...
Traditional automatic speech recognition (ASR) systems usually focus on
...
End-to-end formulation of automatic speech recognition (ASR) and speech
...
Sign languages are visual languages using manual articulations and non-m...
Direct speech-to-speech translation (S2ST) is an attractive research top...
The rapid development of single-modal pre-training has prompted research...
How to boost speech pre-training with textual data is an unsolved proble...
This paper describes the submission of our end-to-end YiTrans speech
tra...
Transformer has been successfully applied to speech separation recently ...
Recently, self-supervised learning (SSL) has demonstrated strong perform...
Previous speech pre-training methods, such as wav2vec2.0 and HuBERT,
pre...
This paper studies a novel pre-training technique with unpaired speech d...
Recently, pioneer work finds that speech pre-trained models can solve
fu...
Multi-talker conversational speech processing has drawn many interests f...
Self-supervised learning (SSL) achieves great success in speech recognit...
The advances in attention-based encoder-decoder (AED) networks have brou...
The speech representations learned from large-scale unlabeled data have ...
Self-supervised learning (SSL) is a long-standing goal for speech proces...
Although pre-training models have achieved great success in dialogue
gen...
Multilingual automatic speech recognition (ASR) models have shown great
...
Speech separation has been successfully applied as a frontend processing...
In this paper, we propose a unified pre-training approach called UniSpee...
With its strong modeling capacity that comes from a multi-head and
multi...
This paper describes the Microsoft speaker diarization system for monaur...
Recently, Transformer based end-to-end models have achieved great succes...
Evaluation metrics play a vital role in the growth of an area as it defi...
Pre-trained models for programming language have achieved dramatic empir...
Continuous speech separation plays a vital role in complicated speech re...
Recently, there has been a strong push to transition from hybrid models ...
To speed up the inference of neural speech synthesis, non-autoregressive...
End-to-end speech translation poses a heavy burden on the encoder, becau...
Non-task oriented dialogue systems have achieved great success in recent...
Attention-based encoder-decoder model has achieved impressive results fo...
Context modeling has a pivotal role in open domain conversation. Existin...
End-to-end speech translation, a hot topic in recent years, aims to tran...
Due to the highly parallelizable architecture, Transformer is faster to ...
Recently, Transformer has achieved the state-of-the-art performance on m...
Pre-training has proven to be effective in unsupervised machine translat...
Without real bilingual corpus available, unsupervised Neural Machine
Tra...
Although end-to-end neural text-to-speech (TTS) methods (such as Tacotro...
Sequence-to-Sequence models were introduced to tackle many real-life pro...