End-to-end Networks for Supervised Single-channel Speech Separation

10/05/2018
by   Shrikant Venkataramani, et al.
0

The performance of single channel source separation algorithms has improved greatly in recent times with the development and deployment of neural networks. However, many such networks continue to operate on the magnitude spectrogram of a mixture, and produce an estimate of source magnitude spectrograms, to perform source separation. In this paper, we interpret these steps as additional neural network layers and propose an end-to-end source separation network that allows us to estimate the separated speech waveform by operating directly on the raw waveform of the mixture. Furthermore, we also propose the use of masking based end-to-end separation networks that jointly optimize the mask and the latent representations of the mixture waveforms. These networks show a significant improvement in separation performance compared to existing architectures in our experiments. To train these end-to-end models, we investigate the use of composite cost functions that are derived from objective evaluation metrics as measured on waveforms. We present subjective listening test results that demonstrate the improvement attained by using masking based end-to-end networks and also reveal insights into the performance of these cost functions for end-to-end source separation.

READ FULL TEXT
research
10/29/2018

End-to-end music source separation: is it possible in the waveform domain?

Most of the currently successful source separation techniques use the ma...
research
06/01/2018

Performance Based Cost Functions for End-to-End Speech Separation

Recent neural network strategies for source separation attempt to model ...
research
06/21/2018

Towards Automated Single Channel Source Separation using Neural Networks

Many applications of single channel source separation (SCSS) including a...
research
10/22/2019

Two-Step Sound Source Separation: Training on Learned Latent Targets

In this paper, we propose a two-step training procedure for source separ...
research
03/11/2023

On Neural Architectures for Deep Learning-based Source Separation of Co-Channel OFDM Signals

We study the single-channel source separation problem involving orthogon...
research
03/07/2021

HTMD-Net: A Hybrid Masking-Denoising Approach to Time-Domain Monaural Singing Voice Separation

The advent of deep learning has led to the prevalence of deep neural net...
research
09/12/2019

TF-Attention-Net: An End To End Neural Network For Singing Voice Separation

In terms of source separation task, most of deep neural networks have tw...

Please sign up or login with your details

Forgot password? Click here to reset