Large self-supervised models are effective feature extractors, but their...
Self-supervised speech representation learning (S3RL) is revolutionizing...
Single-channel speech separation is required for multi-speaker speech
re...
In recent years, with the progress of deep learning technologies, crowd
...
Robustness against noise is critical for keyword spotting (KWS) in real-...
Most current speech enhancement models use spectrogram features that req...
Multimodal affective computing, learning to recognize and interpret huma...
In this paper, we present a novel deep multimodal framework to predict h...