Huaming Wang

research

∙ 09/14/2023

Training Audio Captioning Models without Audio

Automated Audio Captioning (AAC) is the task of generating natural langu...

0 Soham Deshmukh, et al. ∙

research

∙ 09/11/2023

Natural Language Supervision for General-Purpose Audio Representations

Audio-Language models jointly learn multimodal text and audio representa...

0 Benjamin Elizalde, et al. ∙

research

∙ 05/19/2023

Pengi: An Audio Language Model for Audio Tasks

In the domain of audio processing, Transfer Learning has facilitated the...

0 Soham Deshmukh, et al. ∙

research

∙ 01/05/2023

Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers

We introduce a language modeling approach for text to speech synthesis (...

4 Chengyi Wang, et al. ∙

research

∙ 12/29/2022

Learning to mask: Towards generalized face forgery detection

Generalizability to unseen forgery types is crucial for face forgery det...

0 Jianwei Fei, et al. ∙

research

∙ 12/27/2022

General GAN-generated image detection by data augmentation in fingerprint domain

In this work, we investigate improving the generalizability of GAN-gener...

0 Huaming Wang, et al. ∙

research

∙ 11/14/2022

Describing emotions with acoustic property prompts for speech emotion recognition

Emotions lie on a broad continuum and treating emotions as a discrete nu...

0 Hira Dhamyal, et al. ∙

research

∙ 11/04/2022

Real-Time Joint Personalized Speech Enhancement and Acoustic Echo Cancellation with E3Net

Personalized speech enhancement (PSE), a process of estimating a clean t...

0 Sefik Emre Eskimez, et al. ∙

research

∙ 09/28/2022

Audio Retrieval with WavText5K and CLAP Training

Audio-Text retrieval takes a natural language query to retrieve relevant...

0 Soham Deshmukh, et al. ∙

research

∙ 06/09/2022

CLAP: Learning Audio Concepts From Natural Language Supervision

Mainstream Audio Analytics models are trained to learn under the paradig...

0 Benjamin Elizalde, et al. ∙

research

∙ 04/02/2022

Fast Real-time Personalized Speech Enhancement: End-to-End Enhancement Network (E3Net) and Knowledge Distillation

This paper investigates how to improve the runtime speed of personalized...

0 Manthan Thakker, et al. ∙

research

∙ 10/20/2021

One model to enhance them all: array geometry agnostic multi-channel personalized speech enhancement

With the recent surge of video conferencing tools usage, providing high-...

0 Hassan Taherian, et al. ∙

research

∙ 10/18/2021

Personalized Speech Enhancement: New Models and Comprehensive Evaluation

Personalized speech enhancement (PSE) models utilize additional cues, su...

0 Sefik Emre Eskimez, et al. ∙

research

∙ 12/10/2019

Advances in Online Audio-Visual Meeting Transcription

This paper describes a system that generates speaker-annotated transcrip...

15 Takuya Yoshioka, et al. ∙

research

∙ 05/09/2018

Attention-Aware Compositional Network for Person Re-identification

Person re-identification (ReID) is to identify pedestrians observed from...

0 Jing Xu, et al. ∙

research

∙ 03/29/2018

Cracking the cocktail party problem by multi-beam deep attractor network

While recent progresses in neural network approaches to single-channel s...

0 Zhuo Chen, et al. ∙

Huaming Wang

Featured Co-authors

Sign in with Google

Consider DeepAI Pro