Recently, speaker-attributed automatic speech recognition (SA-ASR) has
a...
For speech interaction, voice activity detection (VAD) is often used as ...
In this paper, we propose a joint generative and contrastive representat...
In this paper, we propose an effective sound event detection (SED) metho...
Speaker-attributed automatic speech recognition (SA-ASR) in multiparty
m...
Self-supervised pre-training methods based on contrastive learning or
re...
Speech enhancement (SE) is usually required as a front end to improve th...
Unpaired data has shown to be beneficial for low-resource automatic spee...
With the advance in self-supervised learning for audio and visual modali...
In this work, we propose a bi-directional long short-term memory (BiLSTM...
Wav2vec2.0 is a popular self-supervised pre-training framework for learn...
In this paper, we propose a weakly supervised multilingual representatio...
In this paper, we propose a novel deep learning architecture to improvin...
In this paper, we propose a visual embedding approach to improving embed...
With the development of automatic speech recognition (ASR) and text-to-s...
This paper presents an adversarial learning method for recognition-synth...
This paper proposes an end-to-end emotional speech synthesis (ESS) metho...
In this paper, a method for non-parallel sequence-to-sequence (seq2seq) ...
This paper presents a method of using autoregressive neural networks for...
This paper presents methods of making using of text supervision to impro...
In this paper, a neural network named Sequence-to- sequence ConvErsion
N...
This paper proposes a forward attention method for the sequenceto- seque...
This paper presents a waveform modeling and generation method using
hier...