Deriving multimodal representations of audio and lexical inputs is a cen...
When interacting with smart devices such as mobile phones or wearables, ...
Dysfluencies and variations in speech pronunciation can severely degrade...
The ability to automatically detect stuttering events in speech could he...
In this paper, we address the task of determining whether a given uttera...
Audiovisual speech synthesis is the problem of synthesizing a talking fa...
We present an introspection of an audiovisual speech enhancement model. ...
Emotion plays an essential role in human-to-human communication, enablin...
Automatic speech transcription and speaker recognition are usually treat...