Automated Audio Captioning (AAC) is the task of generating natural langu...
Audio-Language models jointly learn multimodal text and audio representa...
In the domain of audio processing, Transfer Learning has facilitated the...
We introduce a language modeling approach for text to speech synthesis (...
Generalizability to unseen forgery types is crucial for face forgery
In this work, we investigate improving the generalizability of GAN-gener...
Emotions lie on a broad continuum and treating emotions as a discrete nu...
Personalized speech enhancement (PSE), a process of estimating a clean t...
Audio-Text retrieval takes a natural language query to retrieve relevant...
Mainstream Audio Analytics models are trained to learn under the paradig...
This paper investigates how to improve the runtime speed of personalized...
With the recent surge of video conferencing tools usage, providing
Personalized speech enhancement (PSE) models utilize additional cues, su...
This paper describes a system that generates speaker-annotated transcrip...
Person re-identification (ReID) is to identify pedestrians observed from...
While recent progresses in neural network approaches to single-channel s...