Zhiyao Duan

research

∙ 09/16/2023

SynthTab: Leveraging Synthesized Data for Guitar Tablature Transcription

Guitar tablature is a form of music notation widely used among guitarist...

0 Yongyi Zang, et al. ∙

research

∙ 09/14/2023

SingFake: Singing Voice Deepfake Detection

The rise of singing voice synthesis presents critical challenges to arti...

0 Yongyi Zang, et al. ∙

research

∙ 07/27/2023

Mitigating Cross-Database Differences for Learning Unified HRTF Representation

Individualized head-related transfer functions (HRTFs) are crucial for a...

0 Yutong Wen, et al. ∙

research

∙ 06/06/2023

Phase perturbation improves channel robustness for speech spoofing countermeasures

In this paper, we aim to address the problem of channel robustness in sp...

0 Yongyi Zang, et al. ∙

research

∙ 03/11/2023

Transcription free filler word detection with Neural semi-CRFs

Non-linguistic filler words, such as "uh" or "um", are prevalent in spon...

0 Ge Zhu, et al. ∙

research

∙ 11/04/2022

SAMO: Speaker Attractor Multi-Center One-Class Learning for Voice Anti-Spoofing

Voice anti-spoofing systems are crucial auxiliaries for automatic speake...

0 Siwen Ding, et al. ∙

research

∙ 10/27/2022

HRTF Field: Unifying Measured HRTF Magnitude Representation with Neural Fields

Head-related transfer functions (HRTFs) are a set of functions describin...

0 You Zhang, et al. ∙

research

∙ 09/23/2022

ControlVC: Zero-Shot Voice Conversion with Time-Varying Controls on Pitch and Rhythm

Recent developments in neural speech synthesis and vocoding have sparked...

0 Meiying Chen, et al. ∙

research

∙ 07/28/2022

Predicting Global Head-Related Transfer Functions From Scanned Head Geometry Using Deep Learning and Compact Representations

In the growing field of virtual auditory display, personalized head-rela...

0 Yuxiang Wang, et al. ∙

research

∙ 06/21/2022

Rethinking Audio-visual Synchronization for Active Speaker Detection

Active speaker detection (ASD) systems are important modules for analyzi...

0 Abudukelimu Wuerkaixi, et al. ∙

research

∙ 04/19/2022

Music Source Separation with Generative Flow

Full supervision models for source separation are trained on mixture-sou...

0 Ge Zhu, et al. ∙

research

∙ 04/17/2022

A Data-Driven Methodology for Considering Feasibility and Pairwise Likelihood in Deep Learning Based Guitar Tablature Transcription Systems

Guitar tablature transcription is an important but understudied problem ...

0 Frank Cwitkowitz, et al. ∙

research

∙ 02/10/2022

A Probabilistic Fusion Framework for Spoofing Aware Speaker Verification

The performance of automatic speaker verification (ASV) systems could be...

0 You Zhang, et al. ∙

research

∙ 11/01/2021

A Novel 1D State Space for Efficient Music Rhythmic Analysis

Inferring music time structures has a broad range of applications in mus...

0 Mojtaba Heydari, et al. ∙

research

∙ 10/08/2021

A study of the robustness of raw waveform based speaker embeddings under mismatched conditions

In this paper, we conduct a cross-dataset study on parametric and non-pa...

0 Ge Zhu, et al. ∙

research

∙ 08/23/2021

Learning Sparse Analytic Filters for Piano Transcription

In recent years, filterbank learning has become an increasingly popular ...

7 Frank Cwitkowitz, et al. ∙

research

∙ 08/08/2021

BeatNet: CRNN and Particle Filtering for Online Joint Beat Downbeat and Meter Tracking

The online estimation of rhythmic information, such as beat positions, d...

0 Mojtaba Heydari, et al. ∙

research

∙ 07/26/2021

UR Channel-Robust Synthetic Speech Detection System for ASVspoof 2021

In this paper, we present UR-AIR system submission to the logical access...

0 Xinhui Chen, et al. ∙

research

∙ 07/01/2021

Audiovisual Singing Voice Separation

Separating a song into vocal and accompaniment components is an active r...

0 Bochen Li, et al. ∙

research

∙ 04/03/2021

An Empirical Study on Channel Effects for Synthetic Voice Spoofing Countermeasure Systems

Spoofing countermeasure (CM) systems are critical in speaker verificatio...

0 You Zhang, et al. ∙

research

∙ 11/05/2020

Do not look back: an online beat tracking method using RNN and enhanced particle filtering

Online beat tracking (OBT) has always been a challenging task. Due to th...

0 Mojtaba Heydari, et al. ∙

research

∙ 10/27/2020

One-class learning towards generalized voice spoofing detection

Human voices can be used to authenticate the identity of the speaker, bu...

0 You Zhang, et al. ∙

research

∙ 10/24/2020

Raw-x-vector: Multi-scale Time Domain Speaker Embedding Network

State-of-the-art text-independent speaker verification systems typically...

0 Ge Zhu, et al. ∙

research

∙ 09/14/2020

Themes Informed Audio-visual Correspondence Learning

The applications of short-term user-generated video (UGV), such as Snapc...

12 Runze Su, et al. ∙

research

∙ 08/08/2020

Speech Driven Talking Face Generation from a Single Image and an Emotion Condition

Visual emotion expression plays an important role in audiovisual speech ...

0 Sefik Emre Eskimez, et al. ∙

research

∙ 02/08/2020

RL-Duet: Online Music Accompaniment Generation Using Deep Reinforcement Learning

This paper presents a deep reinforcement learning algorithm for online a...

0 Nan Jiang, et al. ∙

research

∙ 10/29/2019

Spoofing Speaker Verification Systems with Deep Multi-speaker Text-to-speech Synthesis

This paper proposes a deep multi-speaker text-to-speech (TTS) model for ...

0 Mingrui Yuan, et al. ∙

research

∙ 07/19/2019

Sound Search by Text Description or Vocal Imitation?

Searching sounds by text labels is often difficult, as text descriptions...

0 Yichi Zhang, et al. ∙

research

∙ 05/09/2019

Hierarchical Cross-Modal Talking Face Generationwith Dynamic Pixel-Wise Loss

We devise a cascade GAN approach to generate talking face video, which i...

8 Lele Chen, et al. ∙

research

∙ 03/28/2018

Lip Movements Generation at a Glance

Cross-modality generation is an emerging topic that aims to synthesize d...

0 Lele Chen, et al. ∙

research

∙ 03/26/2018

Generating Talking Face Landmarks from Speech

The presence of a corresponding talking face has been shown to significa...

0 Sefik Emre Eskimez, et al. ∙

research

∙ 03/23/2018

Audio-Visual Event Localization in Unconstrained Videos

In this paper, we introduce a novel problem of audio-visual event locali...

0 Yapeng Tian, et al. ∙

research

∙ 04/26/2017

Deep Cross-Modal Audio-Visual Generation

Cross-modal audio-visual perception has been a long-lasting topic in psy...

0 Lele Chen, et al. ∙

research

∙ 12/27/2016

Creating A Multi-track Classical Musical Performance Dataset for Multimodal Music Analysis: Challenges, Insights, and Applications

We introduce a dataset for facilitating audio-visual analysis of musical...

0 Bochen Li, et al. ∙

Zhiyao Duan

Featured Co-authors

Sign in with Google

Consider DeepAI Pro