Zero-shot Audio Source Separation through Query-based Learning from Weakly-labeled Data

12/15/2021
by   Ke Chen, et al.
0

Deep learning techniques for separating audio into different sound sources face several challenges. Standard architectures require training separate models for different types of audio sources. Although some universal separators employ a single model to target multiple sources, they have difficulty generalizing to unseen sources. In this paper, we propose a three-component pipeline to train a universal audio source separator from a large, but weakly-labeled dataset: AudioSet. First, we propose a transformer-based sound event detection system for processing weakly-labeled training data. Second, we devise a query-based audio separation model that leverages this data for model training. Third, we design a latent embedding processor to encode queries that specify audio targets for separation, allowing for zero-shot generalization. Our approach uses a single model for source separation of multiple sound types, and relies solely on weakly-labeled data for training. In addition, the proposed audio separator can be used in a zero-shot setting, learning to separate types of audio sources that were never seen in training. To evaluate the separation performance, we test our model on MUSDB18, while training on the disjoint AudioSet. We further verify the zero-shot performance by conducting another experiment on audio source types that are held-out from training. The model achieves comparable Source-to-Distortion Ratio (SDR) performance to current supervised models in both cases.

READ FULL TEXT
research
05/11/2023

Universal Source Separation with Weakly Labelled Data

Universal source separation (USS) is a fundamental research task for com...
research
08/07/2021

A Unified Model for Zero-shot Music Source Separation, Transcription and Synthesis

We propose a unified model for three inter-related tasks: 1) to separate...
research
12/14/2022

CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled Videos

Recent years have seen progress beyond domain-specific sound separation ...
research
04/18/2019

Self-Supervised Audio-Visual Co-Segmentation

Segmenting objects in images and separating sound sources in audio are c...
research
12/21/2019

Deep Audio Prior

Deep convolutional neural networks are known to specialize in distilling...
research
11/06/2019

Finding Strength in Weakness: Learning to Separate Sounds with Weak Supervision

While there has been much recent progress using deep learning techniques...
research
08/14/2019

Interleaved Multitask Learning for Audio Source Separation with Independent Databases

Deep Neural Network-based source separation methods usually train indepe...

Please sign up or login with your details

Forgot password? Click here to reset