We consider speech enhancement for signals picked up in one noisy enviro...
Several recent works have adapted Masked Autoencoders (MAEs) for learnin...
Audio and visual modalities are inherently connected in speech signals: ...
In this paper we derive a Probably Approxilmately Correct(PAC)-Bayesian ...
In this paper we derive a PAC-Bayesian-Like error bound for a class of
s...
In the context of keyword spotting (KWS), the replacement of handcrafted...
By utilizing the fact that speaker identity and content vary on differen...
In recent years, significant progress has been made in deep model-based
...
The intelligibility and quality of speech from a mobile phone or public
...
In recent years, the development of accurate deep keyword spotting (KWS)...
Environmental scene reconstruction is of great interest for autonomous
r...
Since electromagnetic signals are omnipresent, Radio Frequency (RF)-sens...
The ICASSP 2022 Multi-channel Multi-party Meeting Transcription Grand
Ch...
Deep representation learning has gained significant momentum in advancin...
Spoken keyword spotting (KWS) deals with the identification of keywords ...
This paper considers speech enhancement of signals picked up in one nois...
Age of Information (AoI) reflects the time that is elapsed from the
gene...
A central use case for the Internet of Things (IoT) is the adoption of
s...
In this short article, we showcase the derivation of an optimal predicto...
Voice activity detection (VAD) remains a challenge in noisy environments...
The ConferencingSpeech 2021 challenge is proposed to stimulate research ...
In recent years, speech processing algorithms have seen tremendous progr...
In this paper we derive a PAC-Bayesian error bound for autonomous stocha...
In this paper, we propose a novel method that trains pass-phrase specifi...
In this letter, we propose a vocal tract length (VTL) perturbation metho...
Sensing capability is one of the most highlighted new feature of future ...
The loss function is a key component in deep learning models. A commonly...
Due to lack of data, overfitting ubiquitously exists in real-world
appli...
In this paper, we present a deep-learning-based framework for audio-visu...
Speech enhancement and speech separation are two related tasks, whose pu...
In this work, we present the system description of the UIAI entry for th...
Data augmentation is commonly used for generating additional data from t...
One of the beyond-5G developments that is often highlighted is the
integ...
Despite their great performance over the years, handcrafted speech featu...
Applying x-vectors for speaker verification has recently attracted great...
A deep neural network of multiple nonlinear layers forms a large functio...
Both acoustic and visual information influence human perception of speec...
This paper studies context-aware recommendations in the television domai...
Machine Learning systems are vulnerable to adversarial attacks and will
...
This paper proposes a deep learning-based method for learning joint
cont...
Many deep learning-based speech enhancement algorithms are designed to
m...
Keyword spotting (KWS) is experiencing an upswing due to the pervasivene...
This paper presents an unsupervised segment-based method for robust voic...
When speaking in presence of background noise, humans reflexively change...
There are a number of studies about extraction of bottleneck (BN) featur...
Attention level estimation systems have a high potential in many use cas...
Humans tend to change their way of speaking when they are immersed in a ...
Audio-visual speech enhancement (AV-SE) is the task of improving speech
...
Home entertainment systems feature in a variety of usage scenarios with ...
Although speech enhancement algorithms based on deep neural networks (DN...