Guitar tablature is a form of music notation widely used among guitarist...
The rise of singing voice synthesis presents critical challenges to arti...
Individualized head-related transfer functions (HRTFs) are crucial for
a...
In this paper, we aim to address the problem of channel robustness in sp...
Non-linguistic filler words, such as "uh" or "um", are prevalent in
spon...
Voice anti-spoofing systems are crucial auxiliaries for automatic speake...
Head-related transfer functions (HRTFs) are a set of functions describin...
Recent developments in neural speech synthesis and vocoding have sparked...
In the growing field of virtual auditory display, personalized head-rela...
Active speaker detection (ASD) systems are important modules for analyzi...
Full supervision models for source separation are trained on mixture-sou...
Guitar tablature transcription is an important but understudied problem
...
The performance of automatic speaker verification (ASV) systems could be...
Inferring music time structures has a broad range of applications in mus...
In this paper, we conduct a cross-dataset study on parametric and
non-pa...
In recent years, filterbank learning has become an increasingly popular
...
The online estimation of rhythmic information, such as beat positions,
d...
In this paper, we present UR-AIR system submission to the logical access...
Separating a song into vocal and accompaniment components is an active
r...
Spoofing countermeasure (CM) systems are critical in speaker verificatio...
Online beat tracking (OBT) has always been a challenging task. Due to th...
Human voices can be used to authenticate the identity of the speaker, bu...
State-of-the-art text-independent speaker verification systems typically...
The applications of short-term user-generated video (UGV), such as Snapc...
Visual emotion expression plays an important role in audiovisual speech
...
This paper presents a deep reinforcement learning algorithm for online
a...
This paper proposes a deep multi-speaker text-to-speech (TTS) model for
...
Searching sounds by text labels is often difficult, as text descriptions...
We devise a cascade GAN approach to generate talking face video, which i...
Cross-modality generation is an emerging topic that aims to synthesize d...
The presence of a corresponding talking face has been shown to significa...
In this paper, we introduce a novel problem of audio-visual event
locali...
Cross-modal audio-visual perception has been a long-lasting topic in
psy...
We introduce a dataset for facilitating audio-visual analysis of musical...