Improving Target Sound Extraction with Timestamp Information

04/02/2022
by   Helin Wang, et al.
0

Target sound extraction (TSE) aims to extract the sound part of a target sound event class from a mixture audio with multiple sound events. The previous works mainly focus on the problems of weakly-labelled data, jointly learning and new classes, however, no one cares about the onset and offset times of the target sound event, which has been emphasized in the auditory scene analysis. In this paper, we study to utilize such timestamp information to help extract the target sound via a target sound detection network and a target-weighted time-frequency loss function. More specifically, we use the detection result of a target sound detection (TSD) network as the additional information to guide the learning of target sound extraction network. We also find that the result of TSE can further improve the performance of the TSD network, so that a mutual learning framework of the target sound detection and extraction is proposed. In addition, a target-weighted time-frequency loss function is designed to pay more attention to the temporal regions of the target sound during training. Experimental results on the synthesized data generated from the Freesound Datasets show that our proposed method can significantly improve the performance of TSE.

READ FULL TEXT

page 1

page 4

research
12/01/2021

Environmental Sound Extraction Using Onomatopoeia

Onomatopoeia, which is a character sequence that phonetically imitates a...
research
12/19/2021

Detect what you want: Target Sound Detection

Human beings can perceive a target sound that we are interested in from ...
research
06/14/2021

Few-shot learning of new sound classes for target sound extraction

Target sound extraction consists of extracting the sound of a target aco...
research
04/05/2022

RaDur: A Reference-aware and Duration-robust Network for Target Sound Detection

Target sound detection (TSD) aims to detect the target sound from a mixt...
research
04/08/2022

SoundBeam: Target sound extraction conditioned on sound-class labels and enrollment clues for increased performance and continuous learning

In many situations, we would like to hear desired sound events (SEs) whi...
research
08/04/2023

Towards Improving Harmonic Sensitivity and Prediction Stability for Singing Melody Extraction

In deep learning research, many melody extraction models rely on redesig...
research
08/17/2020

Multi-Task Learning for Interpretable Weakly Labelled Sound Event Detection

Weakly Labelled learning has garnered lot of attention in recent years d...

Please sign up or login with your details

Forgot password? Click here to reset